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ABSTRACT 


The  U.S.  Navy’s  Transformation  Roadmap  is  leading  the  fleet  in  a  smaller, 
faster,  and  more  technologically  advanced  direction.  Smaller  platforms  and 
reduced  manpower  resources  create  opportunities  to  fill  important  positions, 
including  ship-handling  control,  with  technology. 

This  thesis  investigates  the  feasibility  of  using  commercial-off-the-shelf 
(COTS)  speech  recognition  software  (SRS)  for  conning  a  Navy  ship.  Dragon 
NaturallySpeaking  Version  6.0  software  and  a  SHURE  wireless  microphone  were 
selected  for  this  study.  An  experiment,  with  a  limited  number  of  subjects,  was 
conducted  at  the  Marine  Safety  International,  San  Diego,  California  ship-handling 
simulation  facility.  It  measured  the  software  error  rate  during  conning  operations. 
Data  analysis  sought  to  determine  the  types  and  significant  causes  of  error. 
Analysis  includes  factors  such  as  iteration  number,  subject,  scenario,  setting  and 
ambient  noise.  Their  significance  provides  key  insights  for  future 
experimentation. 

The  selected  COTS  technology  for  this  study  proved  promising 
overcoming  irregularities  particular  to  conning,  but  the  software  vocabulary  and 
grammar  were  problematic.  The  use  of  SRS  for  conning  ships  merits  additional 
research,  using  a  limited  lexicon  and  a  modified  grammar  which  supports 
conning  commands.  Cooperative  research  between  the  Navy  and  industry  could 
produce  the  “Helmsman”  of  the  future. 
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I.  INTRODUCTION 


A.  VOICE  ACTIVATED  COMMAND  SYSTEM 

This  thesis  focuses  on  speech  exchanges  during  ship  control  processes 
and  specifically  considers  the  potential  of  Commercial-Off-The-Shelf  (COTS) 
voice  recognition  software  as  part  of  a  Voice  Activated  Command  System 
(VACS)  to  replace  Sailors  in  this  process.  VACS  is  a  complex,  multifaceted, 
automated  system  designed  to  perform  the  functions  of  a  Helmsman  who  adjusts 
the  ship’s  rudder  angle,  and  a  Lee  Helmsman  who  adjusts  the  ship’s  engine 
speed.  The  VACS  uses  speech  recognition  software  to  identify  and  transmit  the 
Conning  Officer’s  commands  to  software  programs  interfacing  with  the  rudder 
and  engines. 

Voice  recognition,  also  referred  to  as  speech  recognition  (SR),  software  is 
a  vital  part  of  the  VACS.  The  rudder  and  engine  applications  would  rely  on 
accurate  input  from  the  voice  recognition  software.  Commercial-Off-The-Shelf 
(COTS)  voice  recognition  software  is  currently  available  for  evaluation  and  a 
prospective  technology  for  conning  U.S.  warships.  This  study  reviews  the 
potential  strengths  and  weaknesses,  design  considerations  and 
recommendations  for  future  research  of  the  selected  software  in  a  Voice 
Activated  Command  System. 

B.  BACKGROUND 

Speech  has  been  for  centuries  and  is  today  the  primary  form  of 
communication  in  controlling  ship’s  maneuvers.  Speech  can  be  used  at  a 
distance  which  makes  it  ideal  for  hands-busy  and  eyes-busy  situations.  The 
enduring  truth  about  verbal  communication  is  that  the  receiver,  a  Helmsman, 
must  successfully  interpret  the  information  passed  from  the  person  responsible 
for  maneuvering  the  ship.  The  message  or  command  must  be  clear  and  concise 
using  a  vocabulary  common  to  both  parties. 
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During  the  17*^  and  18**^  centuries,  the  ship’s  Captain  ordered  adjustment 
of  the  sails  to  gain  speed.  He  passed  a  verbal  order  down  the  chain  of  command 
and  the  appropriate  Sailor  changed  the  rigging.  Later  the  Captain  delegated 
these  duties  to  Conning  Officers,  responsible  for  ordering  shipboard 
maneuvering.  Regardless  of  technological  improvement  in  exchanging  important 
information  via  wireless  computers  using  Voice  over  Internet  Protocol,  ship 
maneuvering  dynamics  have  not  changed.  A  Conning  Officer  still  voices 
commands  to  a  Helmsman  who  converts  it  to  action.  Changes  in  transmission 
media  have  led  to  more  effective,  convenient  or  efficient  processes  of  performing 
key  tasks.  These  changes  include  the  development  of  Voice  Activated  Systems 
(VAS),  Figure  1,  computer  software  that  activates  machines  using  the  human 
voice.  Speech  recognition  software  transforms  sound  waves  from  voice  into 
digital  bits.  An  interface  then  interprets  them  as  commands  and  converts  them  to 
mechanical  or  electrical  signals.  Resulting  signals  are  relayed  to  the  rudder  and 
engine  to  adjust  the  angle  and  speed  accordingly. 

VAS  Diagram 


Figure  1 .  Simple  Voice  Activated  System. 
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Over  the  last  decade  the  use  of  VAS  has  become  more  common  and  in 
greater  demand.  Voice  Activated  Systems  are  most  common  in  the  telephone 
industry,  but  as  the  technology  matures  their  use  spreads  to  new  areas.  The 
technology  routinely  responds  to  people  speaking  key-words,  telephones  dial  a 
caller’s  spoken  number  or  allows  businesses  to  automate  transactions  via 
computer  generated  dialogues.  Persons  with  disabilities  are  gaining  personal 
freedom  and  a  sense  of  accomplishment  by  using  Voice-activated  Environmental 
Control  Units,  which  enable  them  to  control  a  full  range  of  electrical  household 
items  simply  by  giving  verbal  commands.  [Ref.  1]  The  same  voice  technology 
that  initiates  turning  on  and  off  lights  or  alarm  systems  can  make  a  valuable 
contribution  to  Navy  systems. 

Driving  or  conning  a  ship  is  a  prime  example  of  human  interaction,  which 
evolved  around  and  through  speech  and  where  a  Voice  Activated  System  could 
be  instrumental.  The  Conning  Officer  gives  a  standardized  verbal  command  and 
the  Helmsman  or  Lee  Helmsman  responds  with  a  formal  verbal 
acknowledgement  and  then  a  verbal  update  of  the  ship’s  status.  To  conclude  the 
sequence  the  Conning  Officer  states  an  understanding  of  the  ship’s  status. 
Conning  a  ship  is  manpower  intensive  and  subject  to  human  error,  which  VAS 
may  assist  in  alleviating. 

C.  SIGNIFICANCE  TO  THE  U.S.  NAVY 

The  U.S.  Navy  faces  numerous  challenges  now  and  in  the  future  and 
stands  at  the  threshold  of  numerous  significant  changes.  “Our  goal  is  to  move 
our  military  from  service-centric  forces  armed  with  unguided  munitions  and 
combat  formations  that  are  large  and  easily  observable,  manpower  intensive, 
earth-bound  capabilities,  and  transform  a  growing  portion  into  rapidly-deployable 
joint  forces  made  up  of  less  manpower  intensive  combat  formations....’’  [Ref.  2] 

One  of  the  most  apparent  and  serious  challenges  is  how  to  perform  all  the 
mission  requirements  with  a  smaller  force.  Manpower  reductions  occurred 
steadily  throughout  the  1990’s  creating  personnel  shortages  on  naval  platforms. 
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To  meet  its  future  objectives,  the  Navy  is  evaluating  methods  to  reduce  manning 
on  each  platform  so  that  more  ships  may  be  put  into  service  without  increasing 
overall  personnel  end  strength.  An  increased  number  of  smaller,  less  manpower 
intensive  ships  may  be  dispersed  across  multiple  theaters  simultaneously. 
These  ships  would  fill  different  mission  requirements  to  meet  the  multitude  of 
diverse  threats  to  U.S.  interests.  Innovative  techniques  to  reduce  ship  manning, 
without  sacrificing  readiness  or  jeopardizing  the  mission  greatly,  benefit  the 
Navy,  especially  since  manpower-related  expenses  combine  to  consume 
approximately  60%  of  its  budget.  [Ref.  3] 

Department  of  Defense  (DoD)  and  Navy  leaders  seek  less  expensive, 
more  productive  and  effective  approaches  to  resolve  this  issue.  The  Secretary  of 
the  Navy  stated  that  one  immediate  goal  is  to  “explore  innovative  manning 
initiatives  such  as  the  Optimum  Manning  program,  which  relies  on  new 
technologies  and  creative  leadership  to  reduce  ship  manning.”  [Ref.  4]  Optimal 
Manning  program  prototypes  are  in  place  aboard  the  USS  MILIUS  and  the  USS 
MOBILE  BAY.  On  board  MILIUS,  the  Optimum  Manning  program,  part  of  the 
Smart  Ship  concept,  is  operating  with  an  “optimal  crew  size  of  just  232,  almost 
20%  less  crew  than  the  usual  complement  for  an  Arleigh  Burke-class  guided 
missile  destroyer.”  [Ref.  5]  MILIUS  and  MOBILE  BAY  report  success  using  an 
optimal  crew  by  introducing  new  technology  and  new  policies  and  procedures, 
characteristic  of  the  Navy’s  transformation.  The  advances  on  these  ships  open 
the  doors  for  Navy  officials  to  research  the  feasibility  of  designing  new  ships  and 
retro-fitting  current  ships  with  VACS. 

The  Voice  Activated  Command  System  (VACS)  has  the  capability  to 
reduce  shipboard  watch  standing  and  maintenance  manpower  requirements. 
VACS  may  substitute  for  the  Helm  and  Lee  Helm  positions.  On  smaller  platforms 
this  means  the  elimination  of  at  least  a  single  watch-stander,  but  as  many  as 
three  watch-standers:  the  Helm,  Lee  Helm  and  the  Helm  Safety  Officer.  This 
reduction  enables  redistribution  of  less  skilled  roles  to  highly  skilled  technical  or 
decision  making  billets  on  board  a  warship,  such  as  the  Littoral  Combat  Ship 
(LOS). 
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The  Navy’s  future  LCS  is  a  multi-mission  surface  combatant  designed  for 
operation  within  100  miles  of  land.  LCS  concepts  require  a  smaller,  faster  and 
more  versatile  vessel  than  its  predecessors.  The  manning  in  LCS  is  projected  to 
be  severely  reduced  compared  to  current  day  standards.  The  smaller  crew 
emphasizes  the  need  to  ensure  every  possible  member  is  performing  mission 
critical  tasks.  One  assumption  regarding  the  design  of  the  LCS  is  that  it  will 
leverage  as  much  technology  as  possible  to  meet  the  proposed  manning  level. 

Currently,  the  Helm  watch  is  posted  twenty-four  hours  a  day,  seven  days  a 
week  while  a  ship  is  underway.  This  manning  would  necessitate  three  helmsmen 
on  eight  hour  shifts  without  any  time  off.  Given  a  35-50  man  crew  with  a 
helmsman  working  an  eight  hour  shift  and  the  helm  manned  24  hours  a  day, 
seven  days  a  week,  approximately  six  to  eight  and  a  half  percent  of  the  crew 
drives  the  ship  full  time,  not  including  the  conning  officer.  Manning  at  reduced 
levels  risks  fatigue,  provides  little  redundancy  and  leaves  no  room  for  training 
personnel  for  replacement.  Increased  manning  for  this  watch  station  would 
require  more  helmsmen,  as  much  as  doubling  the  manpower  requirements. 

Navy  leaders  and  ship  designers  are  presently  exploring  technological 
alternatives  to  reduce  shipboard  manning  requirements.  One  potential  area 
includes  VACS  to  interact  with  the  Ship  System  Control  segment  of  the 
Integrated  Bridge  on  the  Littoral  Combat  Ship  to  help  reduce  manning.  For 
example,  use  of  VACS  aboard  LCS  would  eliminate  the  Helmsman  watch  station 
allowing  a  significant  portion  of  the  crew  to  concentrate  on  performing  other  more 
skilled  duties.  The  deployment  of  a  well  designed,  technologically  advanced  LCS 
will  greatly  enhance  Littoral  Sea  Control  and  assist  in  the  Navy’s  transformational 
programs. 

The  Naval  Transformation  Roadmap  (NTR)  and  Joint  Vision  2020  (JV 
2020)  describe  strategies,  concepts,  initiatives  and  programs  considered  crucial 
in  transforming  the  Department  of  Defense  and  the  Navy  in  particular.  The 
following  quote  emphasizes  the  need  for  technologically  advanced,  automated 
warships  such  as  the  LCS. 
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This  transformation  is  motivated  by  a  vastly  different  security  environment 
that  has  emerged  over  the  last  decade.  Where  once  a  single  monolithic  threat — 
the  Soviet  Union — dominated  the  nation’s  security  planning  and  programming, 
today’s  environment  contains  a  broader,  more  diffuse  set  of  concerns:  terrorism, 
biological  warfare,  regional  tension,  and  an  array  of  other  transnational 
challenges.  [Ref.  6] 

As  stated  previously,  the  need  for  an  LCS  drives  the  need  for  VACS.  Both 
NTR  and  JV  2020  stress  the  Navy’s  need  for  interagency  cooperation  and 
technological  change.  One  major  theme  communicated  in  the  Transformation 
Roadmap  is  “...inserting  technology  to  carry  out  operations  in  ways  that 
profoundly  improve  current  capabilities  and  develop  desired  future  capabilities.” 
[Ref.  7]  VACS  fulfills  that  requirement.  It  can  offer  an  effective  and  less 
manpower  intensive  option  for  maneuvering  a  ship,  to  which  personnel  can  relate 
and  adapt  quickly,  with  minimal  disruption  to  the  current  modis  operandi. 

Essentially,  the  technical  and  operational  transition  can  be  made  because 
the  VAC  system  may  be  designed  to  use  the  same  inputs  as  a  human 
helmsman.  Experimentation  must  demonstrate  that  VACS  software  ensures 
conning  commands  are  delivered  in  the  correct  format  and  that  the  order  given  is 
the  most  appropriate  for  the  intended  maneuver.  Unlike  people,  a  computer 
does  not  interpret  commands  delivered  in  the  incorrect  format,  nor  does  it  make 
adjustments  for  orders  that  do  not  do  exactly  match  what  the  Conning  Officer 
intended.  Conning  officers  need  to  use  the  standard  command  set  to  match  the 
system’s  predefined  vocabulary. 

The  system  assists  with  future  capabilities  as  part  of  the  FORCEnet 
architecture,  an  all-inclusive  maritime  network  intended  to  provide  combatants  all 
necessary  information  and  support  in  real-time.  As  an  integral  part  of  the  Littoral 
Combat  Ship,  VACS  supports  the  Sea  Shield  and  Homeland  Defense  strategies. 
The  utilization  of  smaller,  more  agile  craft  with  smaller  crew  size  and  the  need  for 
reliability  and  precision  make  VACS  a  strong  candidate  solution  for  fleet 
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operations.  VACS  can  reduce  manpower  need,  therefore  reducing  the  number 
of  Service  personnel  exposed  to  “decisive  points  in  battle  or  in  other  operations, 
or  to  be  exposed  to  conditions  of  great  danger  and  hardship”.  [Ref  8] 

D.  MISSION  NEED 

State  of  the  art  Ship  Control  System  that  makes  efficient  use  of  technology 
enables  improved  command  and  control  of  U.S.  Navy  surface  vessels  and 
diverts  manning  to  other  shipboard  war-fighting  requirements.  The  main 
objective  of  a  Voice  Activated  Command  System  is  to  replace  the  helmsman  and 
lee  helmsman.  VACS  is  aimed  at  responding  to  conning  commands  in  the  same 
manner  as  a  helmsman,  providing  feedback,  updates  and  performing  its  primary 
mission  of  transmitting  the  appropriate  control  signals  to  the  rudder  or  engine. 

The  Voice  Activated  Command  System  must  meet  four  overarching 
criteria:  reliability,  multiple-user  capability,  speaker  verification  and  noise 
dampening  capability.  Each  of  these  criteria  is  vital  for  use  on  a  U.S.  warship  to 
ensure  additional  complications  do  not  occur  due  to  malfunctioning  software, 
misinterpretation  of  commands  or  simply  missing  orders  to  the  helm,  especially 
considering  the  inherent  dangers  and  hazards  associated  with  shipboard 
maneuvering. 

Reliability  is  defined  as  the  capacity  of  the  VACS  to  recognize  and 
accurately  relay  commands.  The  level  of  confidence  for  reliability  and  accuracy 
for  this  system  must  be  near  perfect.  Ship  handler  confidence  in  system 
operability  is  essential.  Full  confidence  in  the  software  leads  to  operational 
implementation.  Use  of  unproven  technology  invites  unnecessary  risks. 
Technology  determined  to  be  unreliable  collects  dust  while  Sailors  continue  to 
use  antiquated,  more  costly,  but  proven  processes.  Most  important,  even 
momentary  system  failure  could  result  in  harm  to  the  ship  or  crew,  costing 
millions  of  dollars  in  repairs,  or  worse.  Sailors’  lives. 
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Ship  control  duties  rotate  among  multiple  users  and  must  quickly  and 
smoothly  transition  from  one  user  to  another.  The  VRS  software  must  recognize 
the  speech  patterns,  inflections  and  accents  of  each  individual  user.  Several 
different  conning  officers  assume  the  Watch  on  each  ship,  creating  the  need  for 
accommodating  a  pool  of  watch -stand ers,  one  at  a  time.  The  watches  are  set  for 
limited  periods  of  time  to  ensure  awareness  and  to  reduce  mental  and  physical 
fatigue.  These  factors  increase  the  number  of  VACS  users,  thereby  increasing 
the  need  for  the  software  to  accurately  respond  to  multiple  users.  The  ability  to 
respond  to  a  number  of  distinct  users  must  be  balanced  by  the  requirement  to 
accept  only  the  responsible  individual’s  command. 

Speaker  verification  or  authentication  guarantees  the  VACS  software  only 
listens  to  the  authorized  Conning  Officer  on  watch.  In  addition  to  the  Conning 
Officer,  an  Officer  of  the  Deck  (OOD)  oversees  all  maneuvering  and  seamanship 
duties.  The  OOD  is  the  Commanding  Officer’s  direct  representative  and  the 
VACS  must  be  programmed  to  respond  to  an  emergency  order  from  the  OOD  or 
to  disregard  that  voice,  even  if  stating  a  standard  command,  and  only 
acknowledge  and  execute  the  commands  from  the  currently  authorized  Conning 
Officer.  Speaker  verification  also  allows  for  user  permissions  to  be  set,  such  as  a 
hierarchy  of  emergency  or  safety  overrides.  The  Commanding  Officer  and 
Executive  Officer  require  the  ability  to  negate,  interrupt  or  override  commands 
given  by  officers  with  subordinate  permissions.  As  specified  by  regulation  or 
standards,  officers  with  more  qualified  permissions  may  be  allowed  to  interrupt  or 
override  commands  given  by  subordinate  officers  as  well.  Based  on  the  current 
hierarchical  structure,  most  officers  would  not  be  allowed  to  override  any  other 
officer. 

The  Voice  Activated  Command  System  has  a  few  constraints  associated 

with  its  implementation.  The  system  requires  each  user  to  record  voice  and 

speech  patterns  prior  to  use,  thereby  training  it  to  understand  specific  voices 

stored  in  its  database.  The  system  will  respond  solely  to  their  voices.  The 

logistics  of  installing  and  maintaining  the  system  will  require  information  system 

technicians  are  available  at  all  times,  in  case  of  emergency.  Another  crucial 
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element  is  the  need  for  a  manual  over-ride  system,  a  back-up  system  and 
alternate  power  supply  in  response  to  malfunctions  or  emergencies  which  could 
prevent  proper  operation.  The  last  constraint  is  the  operational  environment. 
Like  any  other  system  it  requires  sufficient  casing  to  ensure  that  the  weather  (i.e., 
salt  air,  lightning  strikes  or  other  such  problems)  does  not  affect  the  circuitry. 
Finally,  more  than  other  modalities,  there  is  the  possibility  of  anthropomorphism 
when  using  speech  recognition.  It  has  been  documented  that  users  tend  to 
overestimate  the  capabilities  of  a  system  if  a  speech  interface  is  used  and  that 
users  are  more  tempted  to  treat  the  device  as  another  person.  [Ref.  9] 

Alternatives  to  VACS  are  interesting  but  have  significant  drawbacks.  One 
option  is  to  not  install  VACS  and  maintain  the  status  quo,  but  this  does  not  allow 
for  reduction  of  manpower  established  in  the  Navy’s  plans  and  vision.  A  Non¬ 
voice  Activated  Command  System  (NACS)  requires  the  operator  to  input  the  data 
manually.  There  are  three  designs  under  consideration,  a  console,  a  wrist  watch 
or  a  helmet.  The  primary  drawback  to  the  NACS  system  is  that  it  does  not  mirror 
the  current  process.  Conning  Officers  would  have  to  learn  a  new  process  to  use 
any  form  of  this  system.  Also,  other  watchstanders  or  supervisors,  including  the 
OOD  would  not  be  able  to  see  or  hear  the  command  until  it  is  initiated,  making  it 
impossible  for  them  to  intervene  in  a  timely  manner.  Console  option  requires  the 
Conning  Officer  to  remain  in  a  stationary  position,  which  prevents  checking  the 
bridge  wings  or  moving  about  to  consult  other  watch-standers.  The  wristwatch 
option  is  more  portable,  but  requires  great  dexterity  to  input  the  data  via  a  key 
pad  on  the  wrist,  which  becomes  even  more  difficult  during  rough  seas,  or  during 
close  maneuvering  operations  requiring  their  full  attention.  The  helmet  option 
would  turn  the  ship  based  on  the  wearer’s  movements.  It  was  initially  designed 
for  aviators  who  remain  seated  throughout  their  mission.  The  helmet  is 
impractical  for  a  conning  officer  whose  safety  duties  demand  motion  whenever 
needed.  SRS  is  the  only  option  that  enables  immediate  oversight  and,  if 
necessary,  override  by  senior  personnel. 
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E. 


PREVIOUS  VACS  STUDIES 


Automating  many  ship’s  functions  has  long  been  sought  by  Navy  leaders. 
Periodic  experiments  have  been  initiated  to  determine  if  technology  has 
developed  enough  to  satisfy  the  ideas  and  theories  of  automating  the  ship¬ 
handling.  Conning  system  automation  shows  a  great  deal  of  promise.  Speech 
recognition  technology,  considered  to  be  the  single  greatest  hindrance,  has 
significantly  improved  over  the  last  decade  and  the  Navy’s  manpower  reduction 
initiatives  have  necessitated  alternatives  for  executing  tasks  previously 
performed  by  Sailors. 

A  Voice  Activated  Command  System  was  tested  as  part  of  the  Integrated 
Bridge  System  HIS  Test  (DT-IB  509)  experiment.  [Ref.  10]  Preliminary 
experiment  results  include  the  following: 

•  Enhance  Conning  Officer  situational  awareness  and  ship  safety, 

•  Require  high  degree  of  user  confidence  in  accuracy  to  reduce 
watch-stander  stressors, 

•  Replicate  current  verbal  ship-handling  commands, 

•  Need  standard  command  vocabulary, 

•  Need  no  greater  than  0.1  second  delay  between  the  command 
receipt  and  execution, 

•  Need  less  cumbersome  support  equipment, 

•  Increase  Conning  Officer’s  receptiveness  to  participating  in  the 
experiment, 

•  Need  capability  for  Conning  Officer  to  take  direct  control, 

•  Need  displays  showing  actual  position, 

•  Need  ability  to  vary  confidence  level  for  each  user, 

•  Need  misinterpretation  fixed  so  that  VACS  does  not  take  the  wrong 
action  or  no  action  at  all, 

•  Participants  preferred  VACS  to  NACS. 

These  initial  results  demonstrate  the  promise  of  technology  and  principal 
areas  of  interest  from  the  Navy  in  directing  research  efforts  in  future  experiments. 
This  thesis  will  focus  on  speech  recognition  software  accuracy,  including 
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experimentation  implementing  the  use  of  standard  commands.  The  experiment 
will  not  focus  on  the  VACS  as  a  whole.  Therefore,  the  signal  transition  from 
digital  to  mechanical  will  be  tested. 

F.  VOICE  RECOGNITION  TECHNOLOGY 

The  Voice  Recognition  Industry  is  growing  rapidly  as  speech  is 
incorporated  into  more  and  more  applications.  The  first  Automatic  Speech 
Recognition  (ASR)  system  was  developed  in  1952  at  the  Bell  Laboratories,  when 
it  could  recognize  the  numbers  zero  through  nine.  Since  then,  ASR  systems 
have  made  significant  strides  and  have  vocabularies  that  recognize  thousands  of 
words. 

There  are  three  main  application  areas  for  speech:  control  and  data  input 
in  a  “hands  busy”  environment,  feedback  in  visually  limited  environments,  and 
system  control  via  telephone  lines.  [Ref.  11]  Initially,  speech  was  used  mainly  for 
company  call  centers.  Today,  speech  is  becoming  commonplace  in  the  home, 
car  and  at  work,  enabling  users  to  interact  with  people,  to  control  consumer 
appliances  and  to  access  personal  and  public  information.  There  are  toys  that 
interact  with  children,  promoting  essential  cognitive  and  motor  skills.  In 
automobiles,  drivers  may  request  directions  and  the  system  tells  drivers  exact 
directions  from  one  location  to  another.  With  this  technology,  drivers  can  change 
the  settings  for  numerous  subsystems  using  voice  commands  in  some  cars. 

Voice  Activated  Command  Systems  are  becoming  a  greater  part  of 
everyday  life.  One  industry  group  estimates  licensing  revenues  and  associated 
technical  proliferation  to  increase  30-fold  between  2002  and  2006.  [Ref.  12]  One 
interesting  VACS,  called  e-medICS,  allows  paramedics  to  dictate  nursing  notes 
and  receive  life-saving  information  from  the  medical  facility  while  on  scene. 
“Being  able  to  operate  the  e-medICS  system  by  speech  commands  leaves 
paramedics'  hands  free  to  effect  treatment  and  operates  equipment,  thus  saving 
vital  minutes  in  the  delivery  of  pre-hospital  care”,  according  to  a  speech 
recognition  case  study.  [Ref.  13] 
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In  a  draft  Request  For  Proposal  (RFP),  the  U.S.  Navy  requested  new 
Navigation,  Seamanship  and  Ship-handling  Trainers  be  scalable,  to  include 
speech  recognition  as  early  as  Fiscal  Year  2004.  The  Navy  proposes  “The  voice 
recognition  technology  would  have  the  computer  respond  to  all  student 
commands  with  the  appropriate  voice  response  and  ship  control  response”  [Ref. 
14]  in  the  simulated  environment.  This  request  clearly  indicates  the  Navy’s 
interest  in,  and  intention  to,  incorporate  speech  recognition  technology  into  the 
bridge  environment. 

Not  only  is  the  technology  developing  but  so  are  the  standards  which 
regulate  the  voice  recognition  technology.  The  National  Institute  of  Technology, 
Speech  Group  [Ref.  15]  is  working  with  the  World  Wide  Web  Consortium  (W3C) 
[Ref.  16]  to  develop  baseline  standards  for  voice  solutions.  These  standards  lay 
the  foundation  for  future  development.  Vendors  add  proprietary  extensions  to 
their  products,  but  the  components  are  built  on  the  same  technology,  enabling 
greater  interoperability  across  components  and  businesses. 

Voice  Extended  Mark-up  Language  (XML)  and  Speech  Application 
Language  Tags  (SALT),  voice  interface  frameworks,  are  in  the  final  stages  of  the 
voice  browser  certification  process.  VXML  and  SALT  allow  easier 
implementation  of  voice  applications.  Each  component  is  independently 
evaluated  on  several  technical  aspects.  Standards  are  released  periodically  to 
help  developers  plan  the  progress  of  a  product.  This  is  significant  in  that 
standards  make  the  technology  more  financially  and  scientifically  competitive, 
create  a  greater  body  of  knowledge,  increase  use  of  the  technology  and  promote 
collaboration  between  companies.  As  product  standardization  spreads,  usually 
the  use  increases  and  the  cost  decrease.  The  process  enables  certification  of 
technicians  and  engineers  for  troubleshooting  and  repairing  products,  increasing 
the  support  base.  Another  reasonable  expectation  is  that  products  withstanding 
the  rigors  of  standards  testing  would  have  a  longer  shelf  life.  Industry  initiatives 
point  in  a  beneficial  direction  for  developers  and  consumers  and  lead  the  way  in 
establishing  a  firm  technological  base  for  military  application  of  this  technology. 
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Speech  Recognition  Software  has  the  potential  to  change  how  U.S.  Navy 
warships  are  driven  in  the  future,  which  will  be  examined  in  the  following 
chapters.  Chapter  II  discusses  the  main  concepts  behind  the  speech  recognition 
components.  It  also  presents  a  brief  overview  of  the  speech  recognition 
technology,  and  specifically  Dragon  NaturallySpeaking  Version  6.0,  and  defines 
the  metrics  used  in  analyzing  this  system.  Chapter  III  discusses  the  experiment 
equipment,  setting,  subjects  and  process  considered  in  this  work.  The  results  of 
the  experiment  are  presented  in  Chapter  IV  along  with  lessons  learned  about  the 
experimental  process.  Chapter  V  covers  the  conclusions  about  the  experiment 
and  submits  recommendations  for  further  research. 
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II.  SPEECH  RECOGNITION  SOFTWARE 


A.  SPEECH  RECOGNITION  SOFTWARE  (SRS)  COMPONENTS 

Speech  recognition  is  the  process  of  converting  an  acoustic  signal, 
captured  by  a  microphone  into  a  set  of  words,  and  applications  can  be  found,  for 
instance  in  command  and  control,  data  entry,  and  document  preparation. 
Recognition  is  usually  more  difficult  when  vocabularies  are  large  or  have  many 
similar-sounding  words.  For  example,  true  homonyms  within  the  vocabulary  may 
cause  great  difficulty  for  the  recognizer.  [Ref.  17]  The  words  ‘for’  and  ‘four’ 
sound  identical  yet  have  very  different  meanings.  The  basic  recognizer  cannot 
tell  which  word  the  user  intended.  Therefore,  several  additional  specialized 
components  are  necessary  to  recognize  human  speech,  which  include  the 
grammar,  lexicon,  and  probabilities  based  on  the  user’s  profile. 

Grammars  or  language  models  are  used  to  restrict  the  possible 
combination  of  words  when  speech  is  produced  in  a  sequence  of  words.  In  the 
‘for’  versus  ‘four’  example,  the  grammar  checks  the  context  to  determine  which 
word  to  insert.  The  lexicon  defines  the  various  pronunciations  of  a  word.  All 
components  are  essential  in  creating  the  most  accurate  speech  recognition 
software,  as  poor  performance  by  any  component  severely  degrades  the  overall 
recognition  accuracy  rate. 

Figure  2  presents  the  typical  components  included  in  a  SRS.  First,  the 
digitized  speech  signal  is  transformed  into  a  set  of  useful  measurements  or 
representations  at  a  fixed  rate,  typically  once  every  10  to  20  msec.  [Ref.  18] 
Representations  attempt  to  compactly  preserve  the  information  needed  to 
determine  the  phonetic  identity  of  a  sequence  of  speech  while  being  as 
impervious  as  possible  to  factors  such  as  speaker  differences,  effects  introduced 
by  communications  channels,  and  paralinguistic  factors  such  as  the  emotional 
state  of  the  speaker.  Representations  used  in  current  speech  recognizers 
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concentrate  primarily  on  properties  of  the  speech  signal  attributable  to  the  shape 
of  the  vocal  tract  rather  than  to  the  excitation,  whether  generated  by  a  vocal-tract 
constriction  or  by  the  larynx,  increasing  the  accuracy. 


Training  Data 


Figure  2.  Speech  Recognition  Software  Components. 

Next,  the  resultant  measurements  are  used  to  search  for  the  most  likely 
word  candidate,  making  use  of  constraints  imposed  by  the  acoustic,  lexical,  and 
language  models  and  the  training  data.  Statistical  language  models,  based  on 
estimated  frequency  of  word  sequence  occurrences  are  often  used  to  guide  the 
search  through  the  most  probable  sequence  of  words. 

B.  INTRODUCTION  TO  SPEECH  RECOGNITION  PROCEDURE 


The  process  of  transforming  acoustic  sounds  into  written  words  or 
commands  is  complex.  The  previous  section  described  each  component.  This 
section  briefly  describes  how  the  Automatic  Speech  Recognition  (ASR),  grammar 
and  lexicon  make  the  transformation. 
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The  dominant  recognition  paradigm  used  for  ASR  is  based  on  the  hidden 
Markov  models  (HMM),  as  illustrated  in  Figure  3.  A  hidden  Markov  model  uses  a 
doubly  stochastic  model,  meaning  that  both  the  phoneme  string  (the  grammar) 
and  the  acoustics  (acoustic  model)  are  represented  probabilistically  as  Markov 
processes.  [Ref.  19]  The  acoustic  model  captures  the  acoustic  speech 
properties  and  provides  the  probability  of  the  observed  acoustic  signal  given  a 
hypothesized  word  sequence  which  includes  acoustic  analysis  and  an  acoustic 
model.  The  acoustic  analysis  divides  the  speech  into  a  sequence  of  acoustic 
vectors.  The  acoustic  model  consists  of  sub-words  called  phonemes,  which  are 
context  dependent  and  the  pronunciation  lexicon,  which  defines  the 
decomposition  of  the  words  into  the  subword  units.  [Ref.  20]. 


The  grammar  or  language  model  provides  a  statistical  estimate  for  the 
prior  probability  of  the  string  of  words.  N-gram  analysis  calculates  the  probability 
of  a  given  series  of  words.  That  is,  given  the  first  word  of  a  pair,  how  confidently 
can  the  next  word  be  predicted?  [Ref.  22]  An  N-gram  can  be  viewed  as  a 
moving  window  over  a  text,  where  N  is  the  number  of  words  in  the  window.  For 
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example,  Bigrams  have  two  consecutive  words,  Trigrams:  three  consecutive 
words,  Quadrigrams:  four  consecutive  words,  etc.  Words  or  phonemes  have 
different  sounds  based  on  their  position  in  a  sentence,  emphasizing  the  need  for 
quality  grammars  and  lexicons. 

A  lexicon  defines  the  pronunciation  of  a  word  and  includes  information 
such  as  phoneme  length.  It  usually  includes  multiple  pronunciations  of  a  word  in 
order  to  accommodate  a  wider  variety  of  speech  patterns.  For  example:  tomato 
can  be  pronounced  ‘to  may  to’  or  ‘to  mah  to’.  Lexical  design  entails  two  main 
phases:  first,  selection  of  the  vocabulary  and  second,  representation  of  the 
pronunciation  entry  using  the  basic  units  of  the  recognition  system.  Lexicons  are 
often  manually  created  and  make  use  of  knowledge  and  expertise  that  is  difficult 
to  codify.  [Ref.  23] 

C.  SPEECH  RECOGNITION  PARAMETERS 

A  criterion  used  to  determine  the  usefulness  or  applicability  of  a  SRS  to  a 
particular  process  is  called  a  parameter.  Each  parameter  has  a  range  or  scale 
by  which  it  is  measured.  The  range  describes  the  least  to  the  most  complex 
mode  of  a  specific  parameter.  Many  parameters  must  be  considered  when 
choosing  a  SRS.  Table  1  presents  the  most  common  parameters.  User 
adoption  rates,  environment,  amount  of  training  necessary  and  the  accuracy  rate 
are  all  influenced  by  the  parameters. 


PARAMETER 

RANGE 

Speaking  Mode 

Isolated  Word  to  Continuous  Speech 

Speaking  Style 

Script  to  Spontaneous 

Enrollment 

Speaker  Dependent  to  Speaker  Independent 

Vocabulary 

Small  (<20  words)  to  Large  (>20,000  words) 

Language  Model 

Finite  State  to  Conteict  Sensitive 

Table  1.  Common  Speech  Recognition  Parameters. 
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An  isolated-word  speech  recognition  system  requires  the  speaker  to 
pause  briefly  between  words,  whereas  a  continuous  speech  recognition  system 
allows  people  to  speak  more  naturally.  Spontaneous  speech  contains  speech 
irregularities,  such  as  ‘uhs’  and  ‘urns’  and  is  much  more  difficult  to  recognize  than 
speech  read  from  a  script.  Some  software  requires  speaker  enrollment,  where  a 
user  trains  the  software  by  providing  speech  samples,  called  a  user  profile.  This 
training  phase  allows  the  system  to  more  easily  detect  words  from  background 
noise,  thereby  decreasing  the  error  rate.  Other  SRS  are  categorized  as  speaker- 
independent,  in  that  no  enrollment  is  necessary.  Speaker  independent  software 
leads  to  a  higher  number  of  errors.  In  addition,  the  size  of  the  vocabulary 
impacts  the  time  necessary  to  recognize  a  word.  The  larger  the  vocabulary,  the 
longer  it  may  take  to  recognize  it.  Finally,  a  context  sensitive  language  model  is 
more  accurate  than  a  finite  model.  The  context  sensitive  model  examines  the 
surrounding  words  as  well  as  the  phonemes  to  determine  the  most  appropriate 
word,  whereas  the  finite  model  makes  its  determination  based  solely  on  the 
phonemes  themselves. 

Speech  Recognition  Software  is  typically  designed  for  use  with  a  particular 
set  of  words,  but  SRS  users  may  want  or  need  to  use  words  not  built  into  the 
default  vocabulary,  leading  to  out-of-vocabulary  word  problems.  A  word  not 
listed  in  the  vocabulary  is  mapped  to  a  word  in  the  dictionary,  causing  an  error. 
ScanSoft  designed  Dragon  NaturallySpeaking  Version  6.0  to  address  that 
problem  and  other  issues  arising  when  using  COTS  SRS  for  conning  a  ship.  A 
SRS  must  meet  certain  criteria  for  use  on  a  U.S.  war  ship: 

•  Accuracy  rate  equal  to  or  greater  than  a  human, 

•  Ability  to  respond  using  verbal  ship-handling  vocabulary, 

•  Use  standard  conning  commands, 

•  Maneuverable  support  equipment,  and 

•  Concise  seamanship  vocabulary. 
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D. 


DRAGON  NATURALLYSPEAKING  VERSION  6.0  (DNSV6.0) 


DNSV6.0  Professional  is  a  commercial-off-the-shelf  (COTS)  continuous 
speech  recognition  software  program  designed  for  use  in  an  office  environment. 
NaturallySpeaking  6  is  fast  and  responsive;  it  reacts  crisply  and  quickly  to  both 
voice  commands  and  dictation.  [Ref.  24]  The  Consumer  Reviewer  “Consensus 
Report,  Table  2,  shows  the  number  of  times  products  are  top-ranked  by 
reviewers  included  in  All  The  Reviews  Reviewed  chart”  [Ref.  25]  presenting  a 
convincing  argument  that  software  and  computer  reviewers  believe  DNSV6.0  to 
be  the  preferred  SRS  on  the  2002  market.  The  following  characteristics  made  it 
appropriate  to  use  for  the  current  study: 


#of 

Picks 

Software  Brand 

7 

ScanSoft  Dragon  NaturallySpeaking  Preferred 

1 

IBM  ViaVoice  Pro 

1 

L&H  Voice  Express  (discontinued) 

Table  2.  All  The  Reviewers  Reviewed  Chart. 

•  A  large  vocabulary, 

•  Speaker  dependent,  indicating  greater  accuracy, 

•  Training  is  quick  and  easy.  A  very  good  speech  profile  can  be 
created  within  15  minutes.  An  additional  15  to  30  minutes  of 
training  leads  to  an  excellent  speech  profile. 

•  A  centralized  accuracy  center  allowing  the  user  to  input  their 
specific  information  for  greater  recognition.  It  has  the  capability  to 
learn  grammatical  style  and  new  vocabulary  from  previously  type 
written  documents. 

•  Ability  to  handle  spontaneous  speech  and  to  add  words  to  the 
vocabulary.  The  ability  to  add  words  is  crucial  since  seamanship 
terms  are  not  part  of  the  average  office  environment  conversation. 

•  Capacity  to  correct  the  document  as  the  person  is  speaking 


20 


•  Highest  recognition  rate  listed  among  the  SRS  competitors  add  to 
its  appeal. 

•  Ease  of  use  with  various  computer  configurations  also  made  it  a 
logical  choice.  Z.  M.  Gao  claims  one  competitor  is  practically 
unusable  in  programs  other  than  Microsoft  Word  and  SpeakPad. 
[Ref.  26] 

•  Designed  to  give  commands  indicating  developers  were  already 
researching  speech  activated  command  and  control. 

•  Its  manufacturer  has  developed  specialized  versions  for  the  legal, 
medical  and  public  works  communities,  signifying  a  more  easily 
specialized  version  for  seamanship  terms.  Some  systems  are 
strictly  telephony-based  and  are  not  well  suited  to  our  application. 

E.  SPEECH  RECOGNITION  HARDWARE  REQUIREMENTS 

DNSV6.0  requires  the  following  hardware  and  software  to  operate 
properly  in  an  office  setting:  Intel®  Pentium®  II  400  MHz  processor,  128  MB 
RAM,  300  MB  free  hard  disk  space,  Microsoft®  Windows®  XP,  Millennium,  2000, 
or  98,  a  16-  bit  recording  sound  card,  Microsoft®  Internet  Explorer®  5  or  higher, 
a  CD-ROM  drive,  a  noise  canceling  headset  microphone  and  speakers.  The 
speakers  allow  the  other  officers  on  the  bridge  to  hear  the  system  text-to-speech 
(TTS)  responses  and  confirm  the  ship’s  movements.  Install  DNSV6.0  as  a 
stand-alone  application  or  turn  off  all  software  applications  not  needed,  including 
background  applications  such  as  anti-virus  detectors.  This  allows  DNSV6.0  to 
utilize  all  available  computing  power  and  improves  recognition  accuracy. 
Although  DNSV6.0  works  with  all  these  systems,  optimal  performance  is 
achieved  with  a  500  MHz  processor  or  faster  and  256  MB  RAM.  [Ref.  27] 

There  are  other  criteria  that  help  with  the  performance  when  choosing  the 
hardware  for  this  system.  Note  that  the  sound  card  should  be  of  high  quality  and 
should  have  a  sound  booster,  as  the  sound  booster  will  adjust  the  sound  volume. 
One  tactic  frequently  used  is  to  turn  on  the  system  and  not  speak  for  a  few 
seconds.  The  lack  of  sound  will  automatically  activate  the  sound  booster, 
improving  recognition  accuracy.  In  addition,  close  attention  should  be  given  to 
microphone  selection.  Several  sound  cards,  microphones  and  speakers  are 
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listed  on  the  manufacturer’s  web  site,  which  are  compatible  with  the  system. 
Specific  consideration  regarding  the  environment,  the  noise  canceling  or 
dampening  capability,  the  user’s  comfort  and  the  portability  (wired  versus 
wireless)  of  the  microphone  went  into  the  selection  process  for  the  experiment. 
The  Conning  Officer’s  need  to  move  to  various  stations  in  and  around  the  bridge 
greatly  restricted  the  selection  to  wireless  microphones. 

Noise  dampening  capability  makes  a  vast  difference  in  the  overall 
performance  of  VACS  by  reducing  noise  interference  from  various  sources. 
Noise  comes  from  several  sources  including  the  ship’s  engine  or  mechanical 
gear,  environmental  factors  such  as  wind  and  rain,  co-workers  and  other  bridge 
equipment.  Ships  are  also  known  to  shudder  at  times,  also  contributing  to 
ambient  noise.  Most  of  the  sources  are  uncontrollable;  therefore  the  noise 
dampening  capability  of  VACS  becomes  more  imperative.  As  a  result,  the  more 
clearly  the  acoustics  are  delivered  to  the  speech  recognition  software,  the  greater 
the  resulting  accuracy  is. 

F.  VOCABULARY 

The  global  vocabulary  in  the  DNSV6.0  is  designed  for  use  by  office 
professionals,  who  each  have  their  own  copy.  It  is  deemed  to  be  large  with  over 
200,000  words.  A  large  vocabulary  allows  more  spontaneous  speech  with  fewer 
corrections,  if  the  user  is  stating  verbiage  typically  used  in  an  office.  Software 
designers  envisioned  one  person  installing  DNSV6.0  at  their  personnel 
workstation  and  then  tailoring  it  for  their  particular  needs,  where  the  tailoring 
occurs  as  each  user  adds  words  to  his/her  personal  profile.  Although  adding 
words  seems  simple,  in  reality,  it  is  time  consuming  because  each  user  must 
update  a  personal  profile  vice  one  administrator  updating  the  global  vocabulary. 
Also,  words  cannot  be  deleted  from  the  global  vocabulary.  Words  that  are 
irrelevant  or  similar  to  terms  more  commonly  used  by  the  Conning  Officer  are 
compared  to  the  incoming  acoustic  stream,  slowing  down  the  response  time  and 
causing  errors.  Advanced  users  may  overcome  this  problem  by  selecting  an 
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Empty  Dictation  at  initial  set-up  and  populating  the  vocabulary  from  scratch.  [Ref. 
28]  However,  the  software  was  used  in  the  preset  configuration  since  this 
experiment  is  designed  for  novice  subjects. 

The  seamanship  vocabulary  and  the  use  of  DNSV6.0  on  board  a  ship  is  a 
challenge  for  any  COTS  SRS.  Neither  DNSV6.0  nor  any  other  current 
commercial  SRS  includes  seamanship  terms  in  the  global  vocabulary.  The 
vocabulary  is  statistically  weighted  to  recall  more  frequently  used  words  first 
resulting  in  new  words  having  a  lower  statistical  rating  than  words  initially  listed  in 
the  global  vocabulary. 

The  lack  of  written  conning  command  documentation  available  to  scan 
into  DNSV6.0  to  assist  learning  new  words  and  phrases  means  the  software 
must  learn  from  current  user  interaction.  DNSV6.0  ability  to  add  words  to  a 
user’s  profile  helps  immensely  in  overcoming  this  problem,  as  only  with  repeated 
use  can  the  SRS  learn  and  recall  the  seamanship  terms  prior  to  words  more 
commonly  used  in  the  office  environment. 

The  language  used  by  Conning  Officer  is  unique  but  standardized.  The 
vocabulary  is  restricted  with  approximately  one-hundred  different  words  used  to 
drive  the  ship.  The  words  are  set  into  a  strict  grammar  used  for  specific 
maneuvers,  called  commands. 

Even  though  the  phrases  are  short  and  standardized  there  are  several 
ways  to  pronounce  them  and  minute  changes  to  the  phraseology  depending  on 
the  ship  or  even  on  the  Commanding  Officer.  For  example,  the  conning  officer 
may  say  ‘rudder’  or  ‘rudders’  amidships  on  ships  with  more  than  one  rudder.  The 
‘s’  on  the  end  seems  trivial  to  the  helmsman  but  the  software  is  not  expecting 
that  ‘s’  and  looks  for  a  similar  word  ending  in  ‘s’,  creating  an  error. 

G.  USER  ENROLLMENT 

One  reason  for  choosing  DNSV6.0  was  due  to  its  easy  enrollment  as 
mentioned  previously.  The  system  provides  step-by-step  instructions  for  every 
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new  user  to  assist  in  creating  a  profile  and  performing  basic  functions.  The 
average  novice  can  enroll  in  approximately  15  minutes.  During  the  enrollment 
process,  the  system  adjusts  the  volume  setting  based  on  the  individual’s 
speaking  style.  It  also  evaluates  the  sound  system  providing  a  Speech-to-Noise 
ratio.  Finally,  the  system  records  the  user’s  speech  pattern  and  style  as  he/she 
reads  a  set  passage. 

Speech  impediments  and  an  extremely  noisy  setting  will  affect  the 
software’s  ability  to  complete  the  user  profile  and  decrease  its  accuracy  rate. 
Lisps,  slurring  words,  and  such  will  decrease  the  software’s  ability  to  recognize 
the  user’s  speech.  If  there  are  any  changes  to  a  person’s  speaking  ability  they 
will  need  to  re-enroll  in  the  system  or  avoid  using  it  until  their  voice  returns  to 
normal.  The  optimal  setting  is  a  quiet  room  without  any  distractions.  But,  in 
reality  the  setting  should  be  similar  to  the  environment  in  which  the  software  will 
be  used,  as  background  noise  in  the  primary  setting  will  cause  distortions  if  not 
accounted  for  during  training. 

H.  METRICS 

Error  rate  or  accuracy  rate  is  a  common  measure  used  to  evaluate  SRS 
performance.  Error  rate,  E  is  typically  described  in  terms  of  word  error  rate  and 
is  described  in  Equation  (1)  as: 

E=(S+I+D/N)*100,  (1) 

where,  N  is  the  total  number  of  words  in  the  test  set,  and  S,  /,  and  D  are  the  total 
number  of  substitutions,  insertions,  and  deletions,  respectively.  [Ref.  29] 

This  system’s  effectiveness  has  several  metrics.  Equation  (1)  will  be  used 
to  determine  the  software  and  the  human’s  accuracy.  There  are  four  types  of 
software  errors. 
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•  Software  Recognizes  the  Wrong  Word  When  the  Correct  Word 
Is  in  the  Vocabuiary 

This  is  an  example  of  outright  misinterpretation.  The  user  may  have 
stated  the  word  differently  when  creating  a  user  profile.  There  is  a  variety  of 
reasons  including  new  context  or  position  in  a  sentence  or  different  intonation  or 
emphasis  on  a  syllable. 

•  Software  recognizes  the  wrong  word  when  the  correct  word  is 
NOT  in  the  vocabuiary 

This  is  an  example  of  a  user  stating  a  word  that  the  software  does  not 
have  in  its  vocabulary.  The  software  maps  to  the  word  most  closely  resembling 
one  that  is  in  the  vocabulary. 

•  Software  does  not  acknowledge  a  word  spoken  by  the 
Conning  Officer 

This  is  an  example  of  the  software  not  hearing  the  word,  or  hearing  it  and 
determining  it  to  be  part  of  another  word  or  background  noise.  For  example,  the 
phrase  ‘meet  her’  may  be  misinterpreted  as  ‘meter’. 

•  Software  adds  a  word  NOT  spoken  by  the  Conning  Officer 

This  is  an  example  of  the  language  model  trying  to  make  the  acoustic 

input  into  a  complete  sentence.  For  example  the  conning  commands  state, 
“steer  course  015”.  The  software  tries  to  interpret  the  sound  and  follow  the 
grammar  built  into  the  software  by  adding  the  word  ‘to’  so  that  the  phrase  read 
“steer  course  to  015”. 

Along  with  the  software  errors  there  are  also  human  errors  in  the  conning 
process.  There  are  numerous  reasons  why  a  Helmsman  may  make  such  an 
error:  distraction,  could  not  hear  well  or  by  rote.  The  helmsman  is  so 
accustomed  to  a  particular  maneuver  in  a  specific  situation  and  reacts  without 
fully  comprehending  the  Conning  Officer’s  command. 

•  Helmsman  hears  an  incorrect  command  and  performs  an  incorrect 
action. 

•  Helmsman  hears  an  incorrect  command  and  performs  the  correct 
action. 
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•  Helmsman  hears  a  correct  command  and  performs  an  incorrect 
action. 

•  Helmsman  does  not  acknowledge  a  command  spoken  by  the 
Conning  Officer. 

This  study  seeks  to  create  an  experimental  environment  for  recording 
each  error  type  occurrence  and  calculating  the  ratio  between  the  event  type, 
subject,  and  trial  to  the  total  word  count.  The  results  should  indicate  the 
feasibility  of  using  this  software  on  a  U.S.  Navy  warship,  and  specify  the  sources 
of  error  wherever  possible. 
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III.  METHODOLOGY  AND  DATA  COLLECTION  FOR  VOICE 
RECOGNITION  SOFTWARE  EXPERIMENT 

A.  OVERVIEW 

The  objective  of  this  study  is  to  determine  the  performance  of  COTS 
speech  recognition  software  in  a  simulated  bridge  environment.  In  an  effort  to 
better  understand  and  make  inferences  regarding  what  produced,  caused  or 
contributed  to  SRS  performance,  this  section  presents  the  observational  frame  of 
reference,  the  assumptions  and  the  experiment  design  prior  to  the  experiment’s 
initiation.  The  expectations,  experiment  design  and  possible  factors  reducing  the 
reliability  of  the  data  will  also  be  considered. 

Expectations  are  ideas  researchers  have  going  into  the  experiment,  which 
are  proven  true  or  false  based  on  the  resultant  data.  Each  expectation 
considered  addresses  specific  questions  regarding  software  performance  versus 
human  error.  The  experiment  is  designed  to  reduce  the  chance  that  the  outcome 
is  due  to  anything  but  the  independent  variables.  Note  that  experimental 
designers  need  to  consider  six  major  classes  of  information,  including  “post¬ 
treatment  behavior  or  physical  measurement,  pre-treatment  behavior  or  physical 
measurement,  internal  threats  to  validity,  comparable  groups,  experiment  errors, 
and  the  relationship  to  treatment”.  [Ref.  30] 

Each  of  these  issues  will  be  addressed  with  the  exception  of  the 
“comparable  groups”  since  the  experiment  required  individual  subject 
comparisons,  not  comparisons  between  groups.  Post-treatment  relates  to 
analysis  of  the  data  and  pre-treatment  considers  information  about  all  aspects  of 
the  experiment  including  the  subjects,  the  software,  the  environment  and  the 
expectations.  Internal  threats  to  validity  are  factors,  which  discredit  or  make 
ambiguous  the  cause  and  effect  relationship.  Experiment  errors  are  any  actions 
or  side  effects,  which  result  in  inaccurate  or  false  data.  The  relationship  to 
treatment  refers  to  the  factors  such  as  the  sequence  or  setting,  which  may  cause 
different  effects  in  the  data. 


27 


B.  EXPERIMENT  OBJECTIVES 


The  basic  measure  of  performance  selected  in  this  work  is  the  number  of 
words  not  recognized  divided  by  the  total  number  of  words  on  a  trial  run  basis. 
Metrics  include  software  and  human  errors,  as  described  in  Chapter  II.  Table  3, 
shown  below,  presents  how  the  observed  results  are  organized,  where  each  cell 
lists  the  observation  and  identify  the  setting,  simulation  scenario  and  vessel  for 
that  trial. 


Trial  1 

Trial  2 

Trial  3 

Trial  4 

Trial  5 

Subject  A 

Result 

(c,  m,  d) 

Subject  B 

Subject  C 

IMPROVEMENT 

Subject  D 

Subject  E 

C  =  console  U  =  underway  d  =  Arleigh  Burke  (DDG) 

S  =  simulator  M  =  mooring  f  =  Frigate  (FFG) 

C  =  channel 


Table  3.  Experiment  Expectations. 

C.  EXPERIMENT  DESIGN 


This  investigation  compares  performance  by  one  unit,  DNSV6.0,  using  five 
subjects.  The  treatment  was  the  trial  performed  by  each  subject.  Each  trial 
lasted  approximately  twenty  to  thirty  minutes. 

The  subjects  considered  were  in  a  block  design  group,  which  means  that 
the  subjects  have  known  commonalities,  which  are  expected  to  affect  the 
outcome  of  the  experiment.  [Ref.  31]  The  block  design  applies  to  this  study 
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because  every  subject  in  the  group  has  three  common  properties,  which  are 
expected  to  affect  the  outcome  of  the  experiment.  The  common  properties  are 
as  follows:  (1)  extensive  ship-handling  experience,  (2)  Officer  Of  the  Deck  (OOD) 
qualifications  and  (3)  male. 

Factors  affecting  the  outcome  included:  a)  console  or  simulator  setting,  b) 
simulation  type,  underway  steaming,  mooring  or  leaving  the  channel  and  c) 
vessel  type,  destroyer  or  frigate.  A  minimum  of  three  and  maximum  of  five  trials 
were  performed  with  each  subject.  The  trials  were  performed  between  normal 
Marine  Safety  International  (MSI)  operations.  Therefore,  some  subjects 
executed  trials  one  after  another  while  other  subjects  completed  a  trial  each  day 
or  when  it  best  suited  their  schedule. 

Randomness  is  important  to  an  experiment  to  remove  any  bias,  as  the 
design  of  a  study  is  biased  if  it  systematically  favors  certain  outcomes.  [Ref.  32] 
Testing  the  subjects  in  varying  ways  decreases  the  likelihood  that  the  experiment 
is  biased.  Another  form  of  randomness  introduced  in  our  study  was  the 
difference  in  which  simulation  program  and  which  vessel  to  conn  was 
considered.  The  subjects  had  the  opportunity  to  simulate  conning  an  Arleigh 
Burke  Destroyer  (DDG)  or  a  Perry-Class  Frigate  (FFG)  with  Auxiliary  Power 
Units  (APUs).  Both  vessels  have  gas  turbine  engines.  There  were  three 
simulations  to  choose  from  a)  underway  steaming,  b)  mooring,  and  c)  leaving  the 
channel.  There  were  also  two  locations  from  which  to  conn,  at  the  console  or  in 
the  simulator.  Subjects  conned  from  both  locations.  Although  randomness  is  a 
positive  aspect  of  the  experiment  the  variation  may  cause  experimental  error. 

Experimental  Error  is  “variation  produced  by  disturbing  factors,  both 
known  and  unknown”.  [Ref.  33]  Experimental  error  can  lead  to  incorrect 
conclusions  by  data  that  is  hidden  or  skewed.  By  reducing  the  unexplained 
variance  in  the  experiment  setting  and  implementation  the  researcher  reduces 
the  possibility  of  experimental  error.  Thus,  reducing  experimental  error  increases 
the  probability  of  reaching  an  accurate  conclusion.  The  design  setting  seeks  to 
avoid  incorrect  conclusions  and  confusion  between  correlation  and  causation. 
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Correlation  occurs  because  one  or  more  variables  are  associated  with 
another  variable.  For  example,  if  there  is  a  correlation  between  the  type  of  ship, 
the  setting  and  the  system  performance  it  does  not  mean  that  the  system 
performance  was  directly  caused  by  the  relationship  between  the  ship  and  the 
setting.  Causation  occurs  when  a  factor  produces  a  change  in  the  experiment 
outcome.  An  example  is  the  subject.  The  expectation  is  that  different  subjects 
will  yield  different  outcomes  given  the  same  scenario  or  setting.  Design  and 
careful  analysis  will  attempt  to  ensure  each  factor  is  appropriately  seen  as  a 
cause  of  the  result,  not  that  the  factor  simply  correlates  with  the  other  factors, 
that  does  not  actually  cause  a  change  in  the  results.  This  leads  to  the  complexity 
of  effects. 

Complexity  of  effects  occurs  as  multiple  factors  are  taken  into 
consideration.  The  investigator  must  identify  how  the  factors  relate  to  one 
another,  if  at  all,  and  then  base  a  decision  within  those  parameters.  The  greater 
the  number  of  factors  the  greater  chance  there  is  for  complexity  of  effects  to 
occur.  On  a  final  experimental  design  note,  this  study  employed  the  randomized 
block  design,  vice  Latin  square,  because  of  potential  interaction  among  factors. 

D.  EQUIPMENT  AND  SIMULATOR 

The  experiment  called  for  the  use  of  a  laptop  computer,  digital  recorder, 
and  wireless  microphone  system.  The  laptop  was  a  Fujitsu  C  Series  LIFEBOOK 
with  an  Intel®  Pentium®  4  CPU  with  160  GFIz  and  256  MB  of  RAM.  A  Sony 
Digital  Voice  Recorder  with  an  8  MB  Memory  Stick  was  used  to  record  the 
responses  from  the  console  operator.  An  operator  acted  as  the  Helmsman,  Lee 
Helmsman  and  any  other  bridge  personnel  necessary  for  the  completion  of  a 
ship’s  movement.  A  SHURE  ULX/S  Standard  Wireless  Microphone  System 
provided  the  flexibility  needed  in  a  bridge  environment.  The  ULX/S  has  an  RF 
Carrier  Frequency  Range  of  554  to  865  MHz  with  an  effective  range  of  100 
meters,  and  an  Audio  Frequency  Response  of  25  to  15,000  Hz,  +/-  2  dB 
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variations.  It  uses  a  battery  pack,  which  easily  clips  to  the  Conning  Officer’s  belt 
or  pocket.  The  battery  life  is  eight  to  nine  hours  using  a  9V  Duracell  MN1604 
alkaline  battery.  [Ref.  34] 

The  experiment  was  performed  at  the  MSI  simulators  in  San  Diego, 
California.  MSI  has  been  providing  ship-handling  training  to  the  commercial 
maritime  industry  and  the  U.S.  Navy  since  1974.  MSI  centers  utilize  the  latest 
simulation  techniques  to  provide  a  realistic  environment,  to  include  the  sounds 
associated  with  ship  maneuvers,  without  real-world  risks,  focusing  on  the 
decision-making  process  vice  the  reaction  process.  Their  courses  are  compliant 
with  all  applicable  International  Maritime  Organization  (IMO),  Standards  of 
Training,  Certification  and  Watch-keeping  for  Seafarers  (STCW),  United  States 
Coast  Guard  (USCG)  and  other  regulations.  [Ref.  35] 

E.  EXPERIMENT  SETTING 

Upon  arrival  at  MSI  the  wireless  system  and  laptop  were  set  up  at  the 
simulator  console.  The  console  is  located  in  an  approximately  20’  X  20’  multi¬ 
purpose  room  with  access  to  a  classroom,  the  passageway  to  the  simulator  and 
the  main  entrance  area,  as  shown  on  Figure  4.  The  room  is  used  for  meetings, 
instruction  and  breaks  as  well  as  the  simulator  command  center.  Foot  traffic  and 
conversations  are  a  normal  part  of  this  setting. 
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Figure  4.  MSI  Console  Room. 

The  simulator  is  positioned  approximately  50  feet  away.  The  simulator 
provides  a  3-D  and  auditory  environment  where  Conning  Officer’s  practice  ships’ 
movements.  The  simulator  is  significantly  noisier  than  the  multi-purpose  room. 
Bow  waves,  buoy  bells,  environmental  noise  and  other  nautical  sounds  are 
simulated  to  create  a  more  realistic  environment.  Table  4  below  provides  the 
noise  levels  in  the  simulator  and  console  room  throughout  each  type  of  scenario. 
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Bridge  Readings 

A 

weighting 

C 

weighting 

Console 

Readings 

A 

weighting 

C 

weighting 

Ambient  Noise 
(UPS  and  AC) 

66.2  dB 

70.2  dB 

Ambient  Noise 
(Computers  and 
AC) 

50.0  dB 

70.0  dB 

FFG  Pierside 
(gas  turbine, 
no  wind,  no  bow 
wave) 

71.1  dB 

71.7  dB 

FFG  Pierside 
(gas  turbine, 
no  wind,  no 
bow  wave) 

51.5  dB 

71.9  dB 

FFG  Underway 
(10  Knots,  10 
knot  relative 
wind) 

69.3  dB 

71.7  dB 

FFG  Underway 
(10  Knots,  10 
knot  relative 
wind) 

52.1  dB 

73.7dB 

FFG  Underway 
(10  knots,  20 
knot  relative 
wind) 

69.8  dB 

72.0  dB 

Doug  Atherton 
Conning 

78.1  dB 

82.3  dB 

FFG  Underway 
(10  knots,  20 
knot  relative 
wind,  gyro 
noise  due  to 
60°/min  ROT 

70.2  dB 

72.6  dB 

Bill  Kirkland 
Conning 

70.0  - 
72.7  dB 

75.9  - 

76.9  dB 

FFG  Underway 
(10  Knots,  20 
knot  relative 
wind,  own  ship 
whistle) 

76.7  dB 

86.0  dB 

FFG  Underway 
(10  knots,  20 
knot  relative 
wind,  conning 
commands  given) 

86.3  dB 

88.2  dB 

Readings  were  made  with  a  s 

ound  pressure  indicator. 

The  voice  and  gyro  sources 

were  one  foot  from  meter. 

Table  4.  MSI  Noise  Levels. 


Dragon  NaturallySpeaking  Version  6.0  was  previously  loaded  into  the 
laptop.  Each  participant  was  shown  the  proper  positioning  of  the  wireless 
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microphone  headset  and  spent  approximately  20  minutes  creating  a  user  profile, 
which  trains  the  software  to  adjust  to  the  speaker’s  speech  volume,  sound  quality 
and  voice.  After  creating  the  user  profile  a  conning  command  vocabulary  was 
added.  Each  participant  trained  the  software  to  recognize  the  new  conning 
commands. 

F.  EXPERIMENT  PROCESS 

After  receiving  an  explanation  of  the  purpose  of  the  experiment  and 
general  guidelines  for  training  the  software,  subjects  fitted  and  adjusted  the 
microphone  to  their  optimal  position.  Next,  they  were  asked  to  speak  in  the  exact 
same  manner  as  if  they  were  giving  conning  commands  on  the  ship,  into  the 
microphone,  following  step-  by-step  instructions  provided  in  the  set  up  of 
DNSV6.0  to  create  a  user  profile.  Once  the  user  profile  was  produced,  subjects 
recorded  a  list  of  seamanship  words  and  phrases  into  their  user  profile. 

After  creating  the  user  profile,  each  subject  was  asked  to  perform  a  trial 
run  in  the  simulator.  In  addition  to  the  computer’s  recording,  each  discrepancy 
between  the  Conning  Officer’s  speech  and  the  software’s  resultant  text  was 
recorded  in  a  narrative  log.  Upon  completion  of  each  trial,  the  data  was  reviewed 
and  the  original  saved.  A  comparison  of  discrepancies  noted  in  the  software  was 
followed  by  immediate  corrections  to  ensure  the  speech  engine  would  associate 
sounds  with  the  correct  words.  Following  the  correction,  a  new  trial  was 
performed  and  the  process  continued.  This  was  an  iterative  process  where  the 
software  “learned”  the  user’s  speech  patterns,  and  an  expectation  was  to 
observe  improvement  with  each  trial  run  per  user.  [Ref.  36] 

G.  EXPECTATION  AND  CONSIDERATIONS 

1.  Expectations 

The  first  assumption  is  that  Dragon  NaturallySpeaking  Version  6.0  will 
perform  differently  based  on  the  subject  being  studied.  As  discussed  in  Chapter 
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II,  the  subject’s  speech  patterns,  accent,  software  training  style  and  voice  volume 
affect  the  software  accuracy.  This  leads  to  the  first  expectation  considered  in  our 
study. 

E1:  Variability  of  software  performance  is  dependent  upon  the 
subject. 

Note  that,  the  software  is  designed  to  learn  the  subject’s  speech 
characteristics  after  repeated  use  and  correction,  which  would  be  indicated  by  an 
improved  recognition  rate.  As  a  result,  performance  should  see  improvements 
with  each  trial,  thus,  leading  to  a  second  expectation. 

E2:  System  performance  will  increase  with  subsequent  trials 

compared  to  previous  trials. 

The  setting,  vessel  type  and  simulation  scenario  varied  among  trials. 
Neither  the  vessel  type  nor  the  simulation  scenario  should  influence  the  results 
among  professional  career  mariners.  The  setting  on  the  other  hand  may  affect 
the  system  performance  due  to  the  difference  in  noise  levels.  These  are 
encapsulated  in  the  third  and  fourth  expectations. 

E3:  There  is  no  significant  difference  in  the  software  performance 

due  to  the  vessel  type  or  simulation  scenario. 

E4:  Setting  affects  the  system  performance. 

Lastly,  the  combined  effects  of  the  subjects,  simulation  scenario  and  the 
setting  may  be  a  source  of  variation  in  software  performance.  A  subject  may  be 
more  comfortable  conning  with  a  particular  Helmsman  or  in  one  scenario  or 
setting,  versus  another.  These  combined  interactions  may  influence  the 
interpretation  of  the  results  and  warrant  analysis,  [Ref.  37]  as  suggested  by  the 
fifth  expectation. 

E5:  Interaction  between  the  subjects,  simulation  scenario  and 

setting  may  cause  variation  in  the  software  performance. 
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2. 


Considerations 


Many  variables  must  be  considered  when  reviewing  and  analyzing  the 
results  of  an  experiment.  Each  variable  and  its  interaction  with  other  variables 
affect  the  outcome  and  interpretation  of  the  data.  This  section  will  highlight  the 
most  prominent  variables. 

According  to  the  ScanSoft  manufacturer,  DNSV6.0  software  is  designed  to 
type  at  least  80  percent  of  a  user’s  dictation  accurately  after  the  initial  training 
session  and  to  achieve  a  90  to  98  percent  accuracy  rate  for  most  users.  [Ref.  38] 
The  expectation  is  that  each  conning  officer  will  experience  system  performance 
at  least  at  the  stated  level.  The  most  valuable  outcomes  from  this  experiment  will 
be  regarding  the  software  operation  initially  and  then  with  repeated  use. 

The  Helmsman  and  Lee  Helmsman  functions  were  performed  by  two 
individuals,  each  with  over  30  years  of  ship  control  experience,  meaning  the  trials 
probably  run  more  smoothly  than  with  a  less  experienced  Helmsman.  Note  the 
human  error  factor  regarding  Helmsman  performance  may  not  necessarily  be 
representative  of  the  values  one  might  observe  in  the  fleet  environment. 
Furthermore,  the  number  of  ship  control  miscues  from  the  conning  officers  due  to 
their  own  mistakes  is  anticipated  to  be  lower  because  each  participant  has 
several  years  more  conning  experience  than  the  average  fleet  operator.  In  fact, 
the  number  of  errors  due  to  misinterpretation  by  the  Helmsman/Lee  Helmsman 
or  mistakes  by  the  Conning  Officer  is  expected  to  be  rare  in  this  environment. 

The  software  may  choose  the  incorrect  word.  There  are  two  issues  to 
take  into  account:  (1)  the  vocabulary  and  (2)  the  statistical  weighting  of  the 
vocabulary.  As  noted  in  Chapter  II,  DNSV6.0  has  an  expansive  global 
vocabulary  and  allows  the  user  to  add  words.  Through  repeated  use,  words 
were  added  to  an  individual’s  vocabulary,  not  to  the  global  vocabulary,  which  is 
time  consuming  and  repetitive.  A  ScanSoft  representative  pointed  out  a 
shortcoming  of  the  SRS,  which  is  that  there  is  no  way  to  add  words  to  the  global 
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vocabulary  directly  by  a  user.  [Ref.  39]  Designers  must  write  the  code  explicitly 
defining  the  global  vocabulary  at  the  factory,  as  is  done  for  DNS  legal  or  medical 
versions. 

DNSV6.0  Professional  software  is  predefined  to  select  the  word  with  the 
highest  probability  of  use  in  the  typical  office  environment.  Since  seamanship 
terms  were  added  to  the  original  vocabulary  for  the  purpose  of  this  study,  they 
have  an  extremely  low  statistical  probability  initially.  Software  will  more  likely 
choose  a  non-seamanship  term  until  the  Conning  Officer  uses  the  term  enough 
to  make  it  a  greater  statistical  probability  than  any  other  word  with  a  similar 
sound.  For  example,  a  Conning  Officer  states  ‘very  well’  in  acknowledging  helm 
responses  to  orders.  ‘Farewell’  is  a  common  closing  salutation  in  the  business 
world;  therefore,  DNSV6.0  chooses  ‘farewell’  until  ‘very  well’  is  used  repetitively 
and  corrected  in  the  software,  increasing  its  probability  higher  than  that  of 
‘farewell’. 

Environment  poses  a  challenge  to  the  external  validity  of  the  experiment, 
where  external  validity  is  defined  as  the  degree  to  which  the  conclusions  reached 
in  this  study  would  hold  for  other  persons,  in  other  places  and  at  other  times. 
[Ref.  40]  Remember,  the  environment  for  this  study  is  not  as  noisy  as  the  bridge 
of  a  ship,  even  though  the  simulator  generates  equipment,  wind,  and  wave 
noises.  In  addition,  there  are  potential  internal  validity  issues,  such  as  selection 
and  experimenter  bias.  Internal  validity  is  the  ability  to  show  cause  and  effect 
between  dependent  and  independent  variables.  [Ref.  41]  The  selection  factor  is 
the  extensive  experience  level  of  the  participants,  which  tends  to  decrease  the 
possibility  of  mistakes  and  misinterpretation  compared  with  conning  officers 
throughout  the  fleet.  Many  times  the  helmsman  anticipates  the  conning  officer’s 
commands,  for  example.  Concurrent  real  world  operations  severely  limited  the 
pool  of  conning  officers  and  helmsmen  available  for  the  observation  of  this  study. 
Finally,  as  the  experiment  progressed,  the  researchers  improved  ability  to 
observe  the  experiment  and  annotate  discrepancies  may  have  lead  to  moderate 
unintentional  experimenter  bias. 
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There  are  many  positive  aspects  to  the  study  as  well.  The  study  was 
performed  in  a  building  with  the  same  physical  attributes  as  a  ship,  such  as  large 
metal  beams  and  walls.  These  facts  are  comparable  to  a  ship,  realistically 
testing  the  wireless  system  connectivity.  The  wireless  system  allowed  the 
participants  to  move  about  the  simulator  bridge  as  they  would  on  a  ship. 
Subjects  exclusively  used  U.S.  Navy  standard  commands  in  ship-handling, 
creating  a  more  realistic  scenario.  Each  candidate  performed  multiple  trials 
enabling  the  system  to  learn  in  between  trials,  creating  a  more  realistic  basis  for 
comparison.  There  were  multiple  accents  and  speech  styles  among  the  subjects 
providing  a  good  base  level  of  variation  among  participants. 

H.  SUBJECTS 

Five  subjects  participated  in  the  experiment  over  a  five  day  period.  None 
of  the  subjects  had  significant  speech  impediments,  illnesses,  or  dental 
appliances  affecting  their  speech.  Table  5  lists  the  characteristics  and 
qualifications  for  each  subject.  The  asterisk  denotes  the  Surface  Warfare 
designation  was  not  instituted  when  Subject  D  served  in  the  Navy.  The  glossary. 
Appendix  C,  identifies  the  acronyms  from  the  table. 
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A 

B 

c 

D 

E 

MSI  Simulation 
Experience 

Instructor 

1 0  years 

Instructor 

4  years 

Computer 

Operator 

1  year 

Computer 

Operator 

9  years 

No 

Naval  Reserves 

N/A 

N/A 

N/A 

N/A 

CDR 

Retired 

U.S.  Navy 

CAPT 

CAPT 

LCDR 

OSC  (E-7) 

No 

At  Sea 

Command  Tour 

3 

(1  0-5  &  2  0-6) 

2 

(1  0-5  &  1  0-6) 

No 

N/A 

1 

(Commercial) 

Years  in 

U.  S.  Navy 

30 

30 

18 

20 

18 

Surface 

Warfare  Officer 

Yes 

Yes 

Yes 

* 

Yes 

Sea  Duty 

20  years 

1 3  years 

12  years 

15  years 

20  years 

Commercial 

Mariner 

No 

No 

No 

No 

20  years 

MSI 

Qualifications 

Ship  Handling 
Instructor 

Ship  Handling 
Instructor 

None 

Senior 

Simulation 

Computer 

Operator 

N/A 

ARPA,  ECDIS, 
BRM  Instructor 

ARPA,  ECDIS, 
BRM  Instructor 

Tables.  Subject  Traits. 
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IV.  DATA  PREPARATION  AND  ANALYSIS  OF  EXPERIMENT 

RESULTS 


A.  EXPERIMENT  SCENARIO 

Day  one,  the  experiment  setup  began  by  comparing  the  equipment  onsite 
at  the  Marine  Safety  Institute  (MSI)  with  the  experiment  equipment  described  in 
Chapter  III.  The  MSI  Technical  Support  Representative  (TSR)  noted  a  special 
connector  was  necessary  to  complete  the  circuit  between  the  simulator  sound 
system,  the  laptop,  and  the  wireless  microphone.  Once  the  equipment  was 
positioned  and  tested,  it  worked  according  to  the  manufacturer’s  specifications. 
With  setup  complete,  the  list  of  seamanship  terms,  listed  in  Appendix  A,  was 
added  to  the  global  vocabulary,  it  is  the  last  step  before  the  subjects  began 
creating  their  profiles,  as  described  in  Chapter  III. 

Subject  D  created  a  new  profile  using  the  SHURE  wireless  microphone 
because  he  made  his  previous  profile  using  a  wired  microphone.  The  need  for 
the  new  profile  arose  when  it  was  observed  there  was  a  difference  in  volume 
when  using  a  wired  versus  wireless  microphone.  Subjects  B,  C  and  E  created 
their  speech  profiles.  The  enrollment  process  took  longer  than  anticipated 
because  each  subject  had  to  record  each  seamanship  term  into  individual 
profiles. 

Day  two.  Subject  A  created  a  speech  profile  and  performed  the  first  trial. 
Immediately  it  was  noticeable  that  the  software  was  not  recognizing  the  majority 
of  words  spoken,  as  the  speaker  was  saturating  the  microphone  level. 
Microphone  volume  saturation  is  indicated  on  the  PC  by  a  red  line  and  needs  to 
be  avoided  or  the  recorded  sounds  are  distorted  and  much  more  difficult  for  the 
software  to  recognize.  Subject  A’s  first  trial  was  stopped.  The  TSR  verified  the 
hardware  connections  were  correct.  After  reviewing  the  troubleshooting  chapter 
of  the  DNSV6.0  User’s  Guide,  it  was  evident  there  was  a  significant  difference 
between  the  subject’s  volume  in  the  profile  compared  to  the  volume  used  in  the 

simulator.  Basically,  Subject  A  spoke  softly  while  reading  the  enrollment  script 
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but  increased  his  speech  volume  and  spoke  more  forcefully  to  project  his  voice 
across  the  room,  as  if  he  were  speaking  to  the  helmsman  in  a  “command  voice” 
when  giving  conning  commands.  Note  this  is  a  common  reaction  for  first  time 
users  and  considered  a  form  of  stage  fright.  [Ref.  42]  The  user  subconsciously 
changes  speaking  style  because  of  an  awareness  of  being  recorded,  but  reverts 
to  a  normal  speaking  volume  and  style  when  in  a  more  familiar  and  comfortable 
situation.  As  a  result.  Subject  A  repeated  the  entire  enrollment  process  with 
instructions  to  speak  in  the  same  manner  and  volume  as  if  giving  commands. 
Subject  A  has  a  strong  New  York  accent,  which  did  not  affect  the  experiment,  as 
the  results  in  the  following  trials  were  satisfactory  and  more  comparable  to  the 
results  of  the  other  subjects. 

Subjects  B,  D,  and  E  performed  a  minimum  of  three  trials  each  throughout 
the  week  without  any  noteworthy  happenings.  Subject  C  performed  his  first  trial 
at  the  console,  on  the  third  day  after  several  trials  from  Subjects  A,  B,  and  D. 
There  were  considerably  fewer  errors  during  this  first  trial  than  in  any  of  the 
previous  first  trials.  There  were  three  possibilities  for  the  cause  of  the  difference, 
a)  decreased  distance  between  the  wireless  microphone  and  the  receiver,  b) 
noise  level  in  the  simulator  versus  the  console  room  or  c)  Subject  C  spoke  more 
clearly  than  the  other  subjects.  According  to  the  TSR,  a  problem  with  the 
microphone  system  due  to  the  distance  between  the  microphone  and  the 
receiver  would  manifest  itself  as  dropping,  not  as  incorrectly  recognizing  a  word. 
Therefore,  distance  was  not  the  problem.  The  answer  became  clearer  when 
Subject  C  completed  his  first  trial  in  the  simulator.  Subject  C’s  recognition  rate 
decreased  slightly  in  the  simulator  compared  to  the  console  room.  The  noise 
level  in  the  simulator  is  audibly  louder  than  at  the  console,  decreasing  the  speech 
recognition  rate.  The  third  possibility  may  also  have  been  that  Subject  C  had  a 
lower  error  rate  than  the  other  subjects  regardless  of  scenario,  setting  or  trial 
number. 


42 


B.  DATA  PREPARATION 


The  final  data  set  consisted  of  23  trials.  The  original  data  worksheets  are 
included  in  Appendix  B.  Subjects  A,  B,  C,  and  D  performed  five  trials  apiece. 
Subject  E  only  performed  three  trials  due  to  schedule  conflicts  and  time 
constraints.  Table  6  represents  the  raw  data  where  the  number  of  errors  is 
divided  by  the  total  word  count  for  each  subject  during  each  trial.  As  predicted, 
Subject  A’s  first  trial  is  drastically  different  from  the  rest  of  the  data.  This 
measurement  may  skew  any  statistical  analysis  of  the  data  if  included.  The 
observations,  described  in  the  previous  section,  indicate  that  results  for  Subject 
A,  Trial  1,  might  need  to  be  removed.  Aggregated  error  counts  across  software 
and  human  error  types,  discussed  in  Chapter  III  are  the  computational  basis  for 
these  error  rates. 


Trial  1 

Trial  2 

Trial  3 

Trial  4 

Trial  5 

Subject  A 

0.893 

0.088 

0.089 

0.054 

0.098 

Subject  B 

0.061 

0.110 

0.080 

0.083 

0.052 

Subject  C 

0.047 

0.052 

0.019 

0.043 

0.039 

Subject  D 

0.063 

0.045 

0.045 

0.046 

0.023 

Subject  E 

0.076 

0.055 

0.034 

Errors/Total  Word  Count 

Table  6.  Raw  Data  Results. 

1.  Data  Analysis  Requirements 

A  few  discussion  points  are  necessary  before  heading  into  the  data 
analysis.  Analysis  of  Variance  (ANOVA)  is  the  appropriate  statistical  tool  and 
requires  the  response  variable  to  be  normally  distributed.  The  principle 
performance  measure  for  the  voice  recognition  system  is  “error”  and  is  a  zero  or 
one  response.  For  each  word,  SRS  either  succeeded  or  failed  in  correctly 
interpreting  the  conning  commands.  These  are  known  as  Bernoulli  trials,  which 
yield  overall  error  rates  as  a  proportion  of  total  word  count.  These  outcomes  are 
distinctly  non-normal  because  a  normally  distributed  variable  is  unbounded 
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between  negative  infinity  and  infinity.  Because  the  response  is  not  normally 
distributed,  the  residuals  of  a  basic  model  would  also  fail  to  meet  this 
requirement,  rendering  ANOVA  invalid. 

Using  the  proportion  of  incorrectly  interpreted  words  as  an  estimator  for 
some  unknown  population  parameter,  0  for  the  probability  of  error  in  interpreting 
any  word,  the  odds  of  failure  are  an  adequate  approach  toward  characterizing 
system  performance.  Equation  2  represents  the  odds  of  error. 

A 

Odds  of  error  =  — .  (2) 

1-0 

The  logit  transform  is  the  inverse  of  the  logistic  function,  taking  its 
argument  defined  on  the  range  [0,  1)  and  returning  output  ranging  from  negative 
to  positive  infinity.  Furthermore,  taking  the  logarithm  of  the  numerator  and 
denominator  yields  a  variable  that  is  positive  for  0  >.5  and  negative  for  0  <  0.5 
and  unbounded  in  both  directions.  [Ref.  43]  The  logit  is  defined  as  the  natural 
logarithm  of  the  odds  of  some  event.  The  odds  of  an  event  are  computed  as  the 
ratio  of  the  probability  that  the  event  will  occur  divided  by  the  probability  that  the 
event  will  not  occur.  [Ref.  44]  The  structure  of  this  transformation  is  expressed  in 
Equation  3  below 


logit(0.)  =  log 


,for  each  run  i 


(3) 


where  the  outcomes  are  a  function  of  the  explanatory  variables  based  on  the 
expectations  stated  in  Chapter  III.  The  logit  transform  yields  a  table  of  values  for 
the  log  of  the  “odds  of  the  SRS  making  an  error  during  trial  i.”  These  transformed 
values,  presented  in  Table  7,  form  the  basis  for  the  data  analysis  and  enable 
more  appropriate  use  of  ANOVA. 
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Trial  1 

Trial  2 

Trial  3 

Trial  4 

Trial  5 

Subject  A 

2.1216 

-2.3445 

-2.3308 

-2.855 

-2.2208 

Subject  B 

-2.73 

-2.0868 

-2.4375 

-2.3979 

-2.9124 

Subject  C 

-3.0123 

-2.8959 

-3.9671 

-3.091 

-3.2155 

Subject  D 

-2.6931 

-3.0511 

-3.0621 

-3.0258 

-3.7485 

Subject  E 

-2.5014 

-2.8478 

-3.3322 

Table  7.  Logit  Transform  Values. 


2.  Influential  Observation 


An  influential  observation  is  any  case,  trial  in  this  study,  whose  presence 
causes  major  changes  in  the  data  results.  [Ref.  45]  The  presence  of  any 
influential  cases  may  become  evident  while  investigating  evidence  of  a  normal 
quantile  plot.  A  quantile  plot  is  assumed  to  have  a  normal  distribution  where  the 
data  points  begin  in  the  lower  left  corner  and  follow  along  an  imaginary  straight 
line  to  the  upper  right  corner.  [Ref.  46]  A  plot  of  the  overall  activity  as  a  function 
of  subject,  trial,  setting  and  scenario  yielded  the  following  normal  quantile  plot 
shown  in  Figure  5.  These  data  suggest  that  a  singular  subject’s  trial  yielded  an 
error  rate  greater  than  0.5  and  a  positive  value  for  the  logit  transform.  All  other 
points  are  negative,  due  to  an  error  rate  less  than  0.5.  As  discussed  in  a 
previous  section,  the  nature  of  this  outcome  was  an  anomaly.  The  resultant  plot 
clearly  demonstrates  the  data  is  not  normally  distributed.  Furthermore,  the 
extreme  nature  of  this  observation  causes  concern  that  it  may  affect  the 
explanatory  model,  making  it  a  candidate  for  removal  as  an  influential 
observation. 
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Figure  5.  Quantiles  of  Standard  Normal  with  Trials  1 , 4,  and  6  Marked  as  the 

Most  Significant. 

The  most  irregular  point  in  this  plot  is  the  first  one,  Subject  A,  trial  one.  It 
deviates  significantly  from  the  overall  pattern  observations,  strongly  influencing 
the  data  set.  This  is  problematic  for  two  reasons.  First,  it  is  not  characteristic  of 
the  overall  performance  observed  throughout  the  rest  of  the  experiment  for  the 
reasons  already  explained.  Second,  it  will  unduly  alter  conclusions  suggested  by 
the  data  set.  To  determine  the  amount  of  influence  Subject  A’s  first  trial  has  on 
the  data  set  the  results  are  calculated  using  Cook’s  Distance  formula.  Cook’s 
Distance  is  the  calculation  of  the  difference  between  the  regression  parameter 
with  the  abnormal  point  and  the  regression  parameter  without  the  abnormal 
point.  [Ref.  47] 

Essentially,  Cook’s  Distance  considers  the  difference  in  model  outcomes 
by  iteratively  removing  observations.  Those  points  whose  removal  most 
markedly  changes  the  predicted  model  computation  yield  a  high  value  for  Cook’s 
Distance,  D.  The  greater  the  D  value  is  the  more  substantial  it  changes  the 
model,  which  is  an  undesirable  situation.  [Ref.  48]  A  graphic  representation  of 
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Cook’s  D  value  and  its  relative  influence  over  the  rest  of  the  analysis  is  shown 
below  in  Figure  6.  As  can  be  seen,  the  problematic  first  observation  for  Subject 
A  has  by  far  the  highest  value  for  Cook’s  D,  marked  by  its  trial  number  on  the 
plot. 


Figure  6.  Cook's  Distance  with  Trials  1, 4,  and  19  Marked  as  Significant. 


For  these  reasons,  further  analysis  will  omit  this  point,  making  use  of  a 
trimmed  data  set  denoted  as  “tr.”  in  future  analysis.  The  term  ‘trimmed’  is  used 
when  labeling  a  table  or  plot  to  denote  a  data  point  was  removed.  Below  in 
Figure  7,  the  Standard  Normal  Quantile  plot  shows  a  reasonably  normal 
distribution  for  the  trimmed  data  compared  to  the  plot  containing  Subject  A’s  first 
trial.  Note  how  the  data  points  follow  a  more  reasonably  normal  distribution 
without  Subject  A’s  abnormal  data  point.  Now  that  the  data  are  more  normally 
distributed,  ANOVA  may  be  performed  on  the  trimmed  data  set. 
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3.  Anova  Methodology 


The  ANOVA  methodology  considers  the  role  of  explained  and  unexplained 
variation  in  performance  measures  as  a  testimony  to  a  model’s  significance. 
Equation  4,  the  measure  of  performance,  in  this  case  SRS  error  is  represented 
as  follows: 

Total  SRS  Variation  =  Explained  Variation  +  Unexplained  Variation.  [Ref.  49]  (4) 

The  distance  between  data  points  and  their  mean  value  measures 
variation.  Distance  is  determined  by  squaring  the  mathematical  difference  in 
values.  These  are  referred  to  as  sums  of  squares,  leading  to  the  Equation  5: 

Sum  of  Sq  (Total)  =  Sum  of  Sq  (Model)  +  Sum  of  Sq  (Residuals).  (5) 

Using  these  sums  of  squares  and  dividing  them  by  the  appropriate 
degrees  of  freedom  (Df),  yields  the  mean  square  for  both  the  model  and 
residuals.  The  ratio  of  the  mean  squares  is  an  F-statistic  that  measures  the 
mean  amount  of  variation  explained  by  this  model  as  compared  to  the  mean 
amount  of  unexplained  variation.  To  be  deemed  appropriate,  the  F-statistic 
requires  both  data  sets  to  be  normal.  [Ref.  50]  These  data  satisfy  that 
requirement,  as  depicted  in  Figure  7. 

After  computing  the  F-statistic,  based  on  the  observed  data,  and 
comparing  this  value  to  the  known  F  distribution,  analysis  yields  a  P-value.  The 
P-value  is  the  probability  of  observing  the  results  seen  during  the  experiment 
given  that  the  null  hypothesis  is  true.  The  null  hypothesis  states  that  introduction 
of  an  explanatory  variable  will  not  have  an  effect  on  the  performance  responses 
of  the  study.  That  is,  there  is  no  difference  among  model  groups.  This  entire 
ANOVA  methodology,  including  sums  of  squares,  degrees  of  freedom,  mean 
squares,  F-statistic  and  P-values  is  summarized  by  an  ANOVA  table  for  each 
model  associated  with  the  five  expectations  identified  in  Chapter  III. 


48 


Quantiles  of  Standard  Normal 


Figure  7.  Trimmed  Results  Plot. 

4.  Inference  Testing 

a.  Expectation  1 

•  Individual  Subject  Accounts  for  Much  of  the  Variability  in 
Software  Performance 

As  noted  earlier  there  was  distinct  variation  among  subjects’ 
performances.  A  couple  of  the  subjects’  performance  results  were  similar  but 
other  subjects  performance  results  had  several  more  or  several  fewer  errors, 
which  indicates  the  null  hypothesis,  “there  is  no  difference  in  software 
performance  due  to  the  subject”,  should  be  rejected.  The  analysis  of  variance 
yielded  a  P-value,  in  Table  8,  that  confirms  the  significance  of  these 
observations. 
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H1 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.subject 

4 

2.344044 

0.586011 

4.36397 

0.01309 

Residuals 

17 

2.282828 

0.134284 

Table  8.  P-Value  for  Expectation  1 . 


Another  confirmation  of  the  observations  is  seen  in  Figure  8,  where  each 
subject’s  performance  is  directly  compared  to  another.  Figure  8  shows  the 
ninety-five  percent  confidence  level  of  the  difference  in  performance.  If  the  data 
includes  zero  then  at  95%  confidence  there  is  no  distinguishable  difference  in 
performance.  Note  the  first  line  A-B.  These  subjects  overall  outcomes  were 
similar  and  the  center  point  is  close  to  zero.  The  95%  confidence  interval 
includes  zero,  meaning  there  is  no  distinguishable  difference  in  performance. 
Next  when  viewing  A-C,  the  center  point  is  skewed  right  to  .8  and  the  interval 
does  not  include  zero,  meaning  there  is  a  distinguishable  difference  in 
performance  at  the  95%  level  of  confidence. 


A-B 

A-C 

A-D 

A-E 

B-C 

B-D 

B-E 

C-D 

C-E 

D-E 


simultaneous  95  %  confidence  limits,  Bonferroni  method 
response  variable:  tr.log. conning. error 


Figure  8.  Subject  Error  Performance  Similarities. 

The  further  the  comparison  center  is  from  zero,  the  greater  the  difference 
in  the  performance  between  the  subjects.  Subjects  A  and  D  performed  quite 
differently  but  not  as  differently  as  Subjects  A  and  C.  Observe  that  Subjects  A 
and  B  performed  similarly  so  the  comparison  between  Subject  B  and  C  is  very 
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similar  to  the  comparison  between  Subject  A  and  Subject  C.  The  variance 
between  the  subjects  above  explains  over  50%  of  the  variability  in  the  SRS  error 
rate.  The  results  of  this  model  confirm  the  first  expectation  is  true. 

b.  Expectation  2 

•  Successive  Trials  for  Individuals  Will  Yield  Better  System 
Performance 

Throughout  the  experiment,  the  expectation  was  for  the  error  rate  to 
decline  with  each  successive  trial  per  subject.  Unfortunately,  those  expectations 
were  thwarted  by  reality.  Instead,  the  error  rate  fluctuated  up  and  down  with 
every  new  trial,  regardless  of  the  subject.  This  was  due  to  inconsistent 
enforcement  of  experiment  controls.  The  subjects  attempted  various  actions  to 
avoid  recording  comments  that  were  irrelevant  to  conning  but  important  to  the 
simulation,  including  turning  the  microphone  off,  trying  to  move  it  away  from  their 
mouth,  and  covering  it.  Each  attempt  inevitably  led  to  a  software  error. 

When  the  microphone  was  turned  on  again  the  subject  would  speak 
before  the  wireless  system  engaged,  resulting  in  words  not  being  recorded.  If 
the  subjects  tried  to  move  it  or  cover  it  up  the  microphone  would  get  bumped 
resulting  in  additional  words  from  the  noise  created  by  the  contact.  Other  errors 
from  contact  occurred  when  a  subject  would  unknowingly  scratch  their  face, 
cough  or  rub  their  nose. 

Another  issue  was  the  introduction  of  new  words.  The  subjects  introduced 
new  vocabulary,  not  previously  incorporated  into  their  profiles  or  the  global 
vocabulary.  This  led  to  an  increased  number  of  Software  Type  2  errors  “software 
recognizes  the  wrong  word  when  the  correct  word  is  not  in  the  vocabulary”.  The 
P-value  in  Table  9  shows  the  strong  probability  that  the  results  observed  would 
occur  given  that  the  null  hypothesis  is  true,  thereby  suggesting  that  trial  number, 
above,  was  inconsequential. 
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H2 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.trial 

4 

0.523838 

0.130959 

0.5426 

0.70668 

Residuals 

17 

4.103034 

0.241355 

Table  9.  P-Value  for  Expectation  2. 


Figure  9  illustrates  the  performance  comparison  of  the  trials.  The  model 
clearly  proves  the  trials  did  not  improve  successively,  but  remained  relatively 
consistent.  All  the  data  points  are  clustered  around  zero  indicating  there  was  no 
distinguishable  difference  in  performance  from  one  trial  to  another.  Moreover, 
there  was  no  indication  of  positive  trend  looking  at  sequential  trials.  From  trial  1 
to  2,  2  to  3,  3  to  4,  and  4  to  5,  there  was  no  consistently  positive  comparison  of 
SRS  response. 
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Figure  9.  Trial  Comparison. 

c.  Expectation  3 

•  There  Is  No  Significant  Difference  in  System  Performance  Due 
to  Operational  Scenario 

The  decision  to  use  a  particular  scenario  or  vessel  in  a  trial  varied.  All 
three  scenarios,  mooring,  channel,  and  underway,  use  the  same  commands  and 
verbiage.  The  vessel  type  changed  but  it  had  no  bearing  on  the  study.  The 
ambient  noise  between  the  scenarios  does  vary.  As  mentioned  in  Chapter  III, 
Table  4,  in  the  simulator,  the  noise  level  increases  as  the  vessel  moves  faster. 
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Therefore,  the  noise  level  while  leaving  the  channel  is  louder  than  mooring  and 
the  noise  level  while  underway  is  louder  than  leaving  the  channel.  Based  on  this 
information,  an  expectation  may  be  to  view  the  most  errors  during  an  underway 
scenario  and  the  fewest  errors  during  a  mooring  scenario.  The  results  did  not 
show  any  major  differences  between  any  of  the  scenarios.  The  P-value,  shown 
in  Table  10,  indicates  a  50  percent  probability  of  observing  the  results  observed 
and  consequently  that  scenario  is  insignificant. 


H3 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.scenario 

2 

0.322484 

0.161242 

0.71174 

0.50342 

Residuals 

19 

4.304387 

0.226547 

Table  10.  P-Value  for  Expectation  3. 

d.  Expectation  4 


•  Setting  Affects  System  Performance 

The  setting,  console  room  versus  simulator,  has  a  crucial  bearing  on  the 
SRS  error  rate.  As  noted  previously,  the  noise  levels  in  the  two  locations  are 
very  different,  with  the  simulator  having  considerably  more  ambient  noise  than 
the  console  room.  The  replicated  sounds  from  the  simulator  could  be  heard  in 
the  console  room  during  the  underway  scenario.  “Dragon  NaturallySpeaking® 
performs  best  in  a  quiet  room.”  [Ref.  51]  The  increased  noise  level  in  the 
simulator  slightly  decreased  the  recognition  rate  comparatively.  After  analysis, 
what  appeared  originally  as  a  slight  decrease  in  recognition,  resulted  in  a 
substantial  reduction. 


Difference  in 
Performance 

Standard 

Error 

Lower 

Bound 

Upper 

Bound 

Console  vs. 
Simulator 

-0.468 

0.18 

-0.844 

-0.0911 

Table  1 1 .  Ninety-Five  Percent  Confidence  Interval  (t  =  2.086). 


Because  this  expectation  is  associated  with  comparing  only  two  sets  of 

data,  the  two-sample  t-test  is  appropriate.  [Ref.  52]  Ninety-five  percent 
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confidence  interval  is  noteworthy  in  that  the  area  encompassed  by  the  upper  and 
lower  bounds  does  not  include  zero  as  exemplified  in  Table  11.  The  fact  that 
zero  is  not  included  signifies  there  is  a  significant  difference  between  outcomes 
from  these  two  settings.  The  confidence  interval  corroborates  the  observations 
during  the  study,  leading  to  the  rejection  of  the  null  hypothesis;  the  setting  does 
not  affect  the  results.  The  upper  and  lower  bounds  equate  to  a  difference  in 
actual  error  rate  between  (.01,  .04)  as  computed  by  an  inverse  of  the  original 
logit  transform. 


e.  Expectation  5 

•  Variation  in  system  performance  may  be  associated  with  an 
interaction  of  subject,  simuiation  and  scenario 

The  last  issue  of  concern  was  whether  any  combination  of  variables 
caused  an  effect  of  significance.  The  subjects  were  given  wide  latitude  during 
testing,  raising  concern  regarding  the  interaction  of  the  variables.  The  subjects 
determined  what  they  said,  where  they  conned  from  and  as  remarked  upon  in 
section  3.C.,  the  scenario  and  vessel  used.  This  latitude  led  to  further  scrutiny  of 
the  data. 

The  original  results  from  the  first  four  expectations  signified  the  need  to 
review  the  possibility  of  interaction  effects  between  the  variables.  During  the 
study,  the  overall  impression  was  that  the  subjects  and  how  well  they  trained  the 
system  were  the  greatest  influence  on  the  accuracy  rate.  The  combination  of  the 
scenario  and  the  subject  seemed  like  a  low  priority  since  the  vocabulary  was 
expected  to  remain  the  same  for  all  trials.  The  first  step  considers  the  cumulative 
statistics  of  the  Full  Model,  accounting  for  all  the  factors  and  interaction  between 
subject  and  scenario.  The  P-value  was  calculated  for  the  scenario  and  subject 
interaction.  Table  11  shows  the  results,  which  was  unexpectedly  significant.  The 
P-value  of  the  subject,  which  is  significant,  is  not  offset  by  the  P-value  of  the 
scenario,  which  is  not  significant,  thus  the  null  hypothesis  is  rejected.  The 
combination  of  all  the 
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variables  plus  the  scenario  interaction  with  subject,  account  for  80%  of  the 
variability  in  the  SRS  performance  (adjusted  R^).  In  other  words,  all  of  these 
factors  play  a  role  in  explaining  variability  in  SRS  performance. 


tr.trial 

tr.setting 

tr.scenario 

tr.subject 

tr.scenarioitr.subject 

Sum  of  Squares 

0.523838 

0.766753 

0.06032 

2.211054 

0.809451 

Deg.  of  Freedom 

4 

1 

2 

4 

4 

Residuals 

Sum  of  Squares 

0.255456 

Deg.  of  Freedom 

6 

Residual  Std  Error: 

0.2063396 

4  out  of  20  effects  not  estimable 

Estimated  effects  are  unbalanced 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.trial 

4 

0.523838 

0.1309594 

3.07589 

0.106258 

tr.setting 

1 

0.766753 

0.7667534 

18.00903 

0.0054176 

tr.scenario 

2 

0.06032 

0.0301599 

0.70838 

0.5294343 

tr.subject 

4 

2.211054 

0.5527634 

12.98297 

0.0040987 

tr.scenarioitr.subjec 

4 

0.809451 

0.2023628 

4.75297 

0.0452828 

Residuals 

6 

0.255456 

0.042576 

Table  12.  P-Value  for  Expectation  5  (Full  Model  Including  Scenario-Subject 

Interaction). 


Analyzing  the  interactions  between  the  subject  and  setting  is  of  great 
interest  because  they  emerged  as  the  most  significant  factors.  The  P-values 
from  the  previous  single  factor  models  indicated  that  both  the  subject  and  the 
setting  are  important.  The  question  to  answer  is  whether  a  subject  in  a  particular 
setting  provides  any  additional  insight.  If  both  are  individually  important,  then 
perhaps  the  interaction  between  the  two  variables  is  also  important.  The  P- 
value,  in  Table  12,  of  the  combined  variables  points  out  that  knowing  which 
setting  the  subject  conned  from  is  not  statistically  significant,  however,  the 
addition  of  this  variable  yielded  no  better  explanation  of  SRS  performance.  The 
results  do  not  allow  rejection  of  the  null  hypothesis;  there  is  no  variation  in 
system  performance  due  to  this  interaction. 
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tr.su  bject 

tr.setting 

tr.subjectitr.setting 

Residuals 

Sum  of  Squares 

2.344044 

0.345104 

0.425363 

1.512361 

Deg.  of  Freedom 

4 

1 

4 

12 

Residual  standard  error: 

0.3550071 

Estimated  effects  may  be  unbalanced 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.su  bject 

4 

2.344044 

0.586011 

4.649771 

0.0169047 

tr.setting 

1 

0.345104 

0.3451036 

2.738264 

0.12387 

tr.subjectitr.settinc 

4 

0.425363 

0.1063408 

0.843773 

0.5237524 

Residuals 

12 

1.512361 

0.1260301 

Table  13.  P-Value  for  Expectation  5  (Subject  *  Setting). 


The  final  model  assessed  the  interaction  between  the  setting  and  the  trial. 
At  MSI,  the  setting  was  arbitrarily  chosen  for  any  given  trial.  Some  subjects 
stood  in  the  simulator,  while  others  stood  or  sat  in  the  console  room.  At  the  time, 
the  location  was  worth  noting  but  not  of  interest.  As  evidenced  by  the  P-value  in 
Table  13,  there  is  little  or  no  significance  regarding  the  SRS  performance.  This 
model  suggests  a  decrease  in  the  value  of  the  setting  as  a  predictor  of  the 
system  execution  and  accounts  for  less  than  33%  of  the  collective  variation  in  the 
SRS  error  rate. 


tr.su  bject 

tr.setting 

tr.trial 

tr.setting:tr.trial 

Residuals 

Sum  of  Squares 

2.344044 

0.345104 

0.421589 

0.295939 

1.220196 

Deg.  of  Freedom 

4 

1 

4 

4 

8 

Residual  Std.  Error:  0.3905439 

Estimated  effects  may  be  unbalanced 

summary(tr.subject.trial.setting.aov) 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr(F) 

tr.su  bject 

4 

2.344044 

0.586011 

3.842076 

0.049869 

tr.setting 

1 

0.345104 

0.3451036 

2.26261 

0.170942 

tr.trial 

4 

0.421589 

0.1053972 

0.691018 

0.618499 

tr.setting:tr.trial 

4 

0.295939 

0.0739847 

0.485068 

0.747094 

Residuals 

8 

1.220196 

0.1525246 

Table  14.  P-Value  Expectation  5  (Setting  *  Trial). 
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c. 


EXPERIMENTAL  DESIGN  AND  IMPLEMENTATION  LESSONS 
LEARNED 


Throughout  the  experiment  and  subsequent  analysis,  it  became  apparent 
improvements  in  the  design  or  implementation  of  future  experiment  would  yield 
better  results.  Changes  in  experiment  implementation  contributed  to  data 
unexplained  variability  that  seemed  harmless  at  the  time,  but  the  results  made  it 
clear  the  changes  impacted  the  study.  The  following  list  contains  a  few  of  the 
key  lessons  learned. 

•  Begin  each  trial  with  a  standard  phrase  to  initiate  the  software  to 
allow  the  software  to  engage, 

•  Use  the  speaking  style  appropriate  to  the  task  while  creating  the 
speech  profile.  This  reduces  errors  and  avoids  the  need  to 
recreate  the  profile, 

•  Ensure  each  subject  in  the  first  trial  speaks  100%  of  the 
vocabulary.  Additional  words  unknown  to  the  lexicon  result  in 
errors  and  distort  the  successive  software  learning  process, 

•  Ensure  all  subjects  perform  the  same  number  of  trials  to  ensure  a 
balanced  data  set  for  analysis, 

•  Wait  approximately  two  seconds  after  the  wireless  microphone  is 
turned  on  before  speaking.  There  is  a  slight  delay  before  it  begins 
transmitting  the  signal  to  the  software,  resulting  in  error, 

•  Do  not  make  contact  with  the  microphone  during  recording.  The 
software  constantly  seeks  to  create  a  word.  Any  noise  activates 
the  software  and  adds  unwanted  words  to  the  text. 

•  Keep  spare  batteries  available  at  all  times  for  the  microphone  or 
invest  in  a  rechargeable  battery  pack.  The  wireless  system  needs 
new  batteries  regularly.  The  manufacturer  states  the  battery  lasts 
eight  to  nine  hours.  Observe  the  indicator  on  the  system  to  insure 
the  battery  does  not  die  during  use. 

•  Copy  and  save  the  original  transcript  prior  to  making  corrections. 
The  original  copy  contains  all  the  errors  while  the  corrected  copy 
has  what  the  conning  officer  actually  said. 

The  key  lessons  learned  about  implementing  an  experiment  are  corrective 
actions  to  lessen  the  opportunity  for  disruptions  or  errors  in  future  studies. 
Issues  arose  throughout  the  study  that  had  not  occurred  during  the  pre-test 
phase,  requiring  small  adjustments  in  the  experiment  process.  For  example, 
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when  pre-testing  the  microphone,  there  had  been  a  sufficient  delay  in  speaking 
to  allow  the  system  to  engage.  This  was  not  pre-planned,  but  occurred  naturally. 
Also,  the  need  for  batteries  may  present  a  challenge  on  a  ship.  Rechargeable 
batteries  are  a  more  economical  and  space  saving  alternative.  In  addition  there 
are  several  types  of  wireless  microphones  on  the  market  and  additional  research 
is  necessary  to  confirm  which  one  is  best  suited  for  the  shipboard  environment. 

Overall,  the  experiment  provided  useful  data  concerning  the  use  of 
Commercial-Off-The-Shelf  speech  recognition  software  for  conning  ships. 
Improved  experiment  design  knowledge  may  have  resulted  in  a  more  normal 
data  pool  and  led  to  more  conclusive  analysis  of  DNSV6.0,  as  numerous  factors 
influence  speech  recognition  software  performance  such  as  subject,  trial,  setting, 
scenario,  vessel,  possible  Interactions,  etc. 

In  this  analysis,  some  interactions  emerged  as  significant,  making  a 
randomized  blocked  design  the  most  appropriate.  Firm  control  over  noise  factors 
such  as  spurious  verbal  sounds  and  microphone  adjustments  will  provide  data 
that  are  more  refined.  However,  these  last  two  noise  factors  are  serious 
characteristics  of  human  behavior  that  must  be  considered  during  system  design. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  ANALYTICAL  CONCLUSIONS 

This  experiment  was  the  first  feasibility  study  for  commercial-off-the-shelf 
(COTS)  speech  recognition  software  as  a  tool  for  conning  U.  S.  warships  and 
yielded  important  insight  into  SRS  performance  and  for  further  studies  of  this 
system.  The  error  rate,  size  of  the  vocabulary,  and  user  enrollment  are  key 
design  considerations  in  adopting  this  technology. 

The  research  provides  quantitative  evidence  that  the  SRS  error  rate  is 
strongly  dependent  on  the  user.  Users  having  difficulty  achieving  acceptable 
error  rates  are  encouraged  to  train  the  software  more  thoroughly.  The  error  rate 
is  moderately  impacted  by  the  surrounding  ambient  noise  but  can  be  minimized 
by  creating  the  user  profile  in  the  noise  environment  in  which  it  is  to  be  operated 
and  by  using  noise  dampening  hardware. 

The  study  emphasized  the  need  for  a  focused  and  limited  yet  complete 
ship-handling  vocabulary  or  lexicon.  DNSV6.0  has  a  large  vocabulary  creating 
more  opportunity  for  poor  recognition,  which  is  a  significant  drawback.  It  also 
has  the  ability  to  learn  new  words  and  to  create  special  vocabularies,  which  is  a 
positive  trait.  The  SRS  insistence  on  proper  grammar  added  words  and  created 
misinterpretations  in  its  attempt  to  meet  the  pre-defined  office  rules.  During 
testing,  SRS  “learned”  new  rules  required  for  conning  within  five  trials. 

As  mentioned  earlier,  the  user  is  the  most  significant  factor  in  the  success 
or  failure  of  SRS.  The  user’s  successful  enrollment  is  the  keystone  to  the 
process.  Subject  A  of  the  study  demonstrated  how  an  erroneous  enrollment  can 
have  detrimental  effects  on  the  resulting  SRS  accuracy  rate.  Users  should  be 
reminded  to  speak  normally,  using  the  same  speech  pattern,  volume  and  speed 
as  usually  used  in  the  specified  situation. 

The  study  also  revealed  some  important  points  about  the  wireless 
microphone.  Microphone  position  influences  operational  capability.  The  simple 
act  of  rotating  the  microphone  upwards,  toward  the  temple,  completely  stopped 
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speech  transmission.  This  emphasized  the  high  quality  of  the  noise  dampening 
feature  built  into  the  microphone  as  well  as  the  need  for  correct  positioning.  The 
wireless  system  is  power  intensive,  requiring  frequent  battery  changes,  but  it 
does  have  an  indicator  letting  the  user  know  of  its  current  status.  Users 
attempting  to  use  the  microphone  on-off  switch  created  an  unforeseen 
occurrence.  The  delay  from  the  time  the  microphone  was  turned  on  until  it  began 
receiving  the  signal  caused  a  lack  of  recognition.  Once  aware,  the  subjects  did 
not  have  additional  problems. 

B.  IMPACT  OF  THIS  STUDY 

The  U.  S.  Navy’s  transformation  and  vision  to  reduce  future  ship  size  and 
manning  requirements  indicate  the  need  for  an  increase  in  technological 
apparatus  to  perform  the  functions  currently  performed  by  Sailors.  The  Voice 
Activated  Command  System  is  a  concept  included  in  the  design  concept  of  the 
Integrated  Bridge  System  (IBS).  This  concept  seeks  to  develop  technological 
alternatives  that  support  safe  and  sound  ship-handling.  There  are  many 
engineering  alternatives  for  incorporating  technology  and  reducing  manpower 
that  preserve  reliability  and  maintain  high  confidence  levels  but  SRS  is  a  readily 
available  and  viable  option,  today. 

The  study  demonstrated  that  basic  speech  recognition  software  is  suitable 
for  testing  and  incorporation  in  future  IBS  designs.  There  are  additional  issues, 
which  must  be  addressed  during  the  design  process,  which  were  not  covered  in 
this  thesis.  They  include  the  use  of  speaker  recognition  capabilities  to  allow 
certain  individuals,  such  as  the  Commanding  Officer;  specific  rights  not  afforded 
general  bridge  personnel.  Another  issue  is  the  ability  to  engage  and  disengage 
the  microphone.  Some  systems  use  a  button  while  others  use  a  keyword.  The 
COTS  SRS  used  in  this  study  uses  a  keyword,  “microphone  off”,  to  disengage 
the  microphone,  but  the  microphone  must  be  turned  on  manually.  This  is  not 
practical  for  a  conning  officer  who  must  speak  to  bridge  personnel  regarding 
issues  about  the  ship  but  not  actually  driving  the  ship.  One  COTS  SRS 
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incorporates  a  capability  for  the  microphone  to  go  into  a  sleep  or  stand  by  mode 
when  a  key  word  is  spoken  to  disengage,  “Go  to  sleep”  or  “Stop  Listening”.  Then 
wait  and  listen  for  the  on  keyword,  “Wake  Up”  or  “Listen  to  me”.  [Ref.  53]  The 
words  more  appropriate  for  a  ship’s  bridge  are  “Helmsman”  to  activate  recording 
and  “Very  well”  to  deactivate  recording. 

Speech  recognition  software  is  sufficiently  technologically  advanced  to 
enable  VACS  to  clearly  receive  commands  from  the  conning  officer.  It  is  capable 
of  recognizing  and  transmitting  conning  commands  to  VACS  with  an  acceptable 
accuracy  rate.  COTS  SRS  is  a  feasible  solution  for  achieving  future  Navy 
mission  requirements. 

C.  RECOMMENDATIONS  FOR  FUTURE  STUDY 


The  COTS  SRS  used  for  this  study  came  straight  out  of  the  box  with  only 
one  change,  the  addition  of  ship-handling  vocabulary.  The  study  did  not  test  all 
features,  which  may  have  improved  the  results  of  the  study.  The  following  is  a 
list  of  recommendations  based  on  the  study  findings: 

•  Perform  a  follow-on  study  on  a  U.  S.  Navy  ship  to  determine  the 
potential  impacts  of  a  true  ship  environment  and  due  to  ambient 
noise  differences, 

•  Perform  follow-on  trials  using  advanced  user  options.  One 
advanced  untested  option  was  the  ability  to  correct  while  speaking. 
In  this  study,  all  corrections  were  made  at  the  end  of  a  trial  vice 
stopping  the  simulation  and  correcting  immediately,  a  more 
effective  method  of  improving  SRS  performance.  Another  option, 
which  may  have  a  profound  impact,  is  a  system  which  does  not 
include  a  vocabulary.  Current  COTS  SRS  has  such  a  system 
where  a  language  model  exists,  but  each  individual  user  inserts  the 
necessary  words,  such  as  those  included  in  Appendix  A. 

•  Investigate  recording  standard  conning  phrases  as  opposed  to 
recording  individual  words  during  enrollment  to  increase  recognition 
rates, 

•  Increase  the  time  allotted  to  subjects  during  the  enrollment  phase 
to  enable  them  to  become  more  comfortable  speaking  to  a 
computer  and  wearing  a  wireless  microphone. 


61 


The  results  of  this  study  indicate  COTS  SRS  is  a  viable  alternative  for 
further  evaluation  on  the  high  seas.  As  long  as  the  components  are 
technologically  advanced  and  employ  the  best  features  on  the  commercial 
market,  the  system  can  support  further  testing.  Legal  and  medical  versions  of 
COTS  SRS  prove  industry  has  the  ability  to  modify  the  system  to  accommodate 
very  specific,  high  profile  applications,  and  a  similar  approach  could  be  followed 
for  ship-handling  operations.  Specific  applications  require  specific  lexicons, 
meaning  it  only  includes  words  necessary  to  complete  the  task.  A  SRS  with  a 
small,  but  applicable  lexicon  is  best  suited  for  conning  operations.  The  smaller 
lexicon  reduces  the  opportunity  for  the  software  to  choose  a  similar  yet  incorrect 
word. 

There  are  numerous  traditional  and  bureaucratic  reasons  for  not 
embracing  a  technology  that  does  what  humans  have  performed  for  centuries. 
However,  the  technology  is  available  and  ready,  and  the  opportunity  to  explore 
change  exists.  Further  testing  and  evaluation  of  speech  recognition  software  to 
support  ship  control  systems  and  processes  propels  ship-handling  from  elements 
employed  in  the  days  of  sail  and  steam  into  the  future  of  maneuvering  warships 
at  sea. 
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APPENDIX  A.  SHIP-HANDLING  VOCABULARY 


0 

46 

Knots 

1 

47 

Lee  Helm 

2 

48 

Left 

3 

49 

Magnetic 

4 

50 

Maneuvering 

5 

1/3 

Mark 

6 

2/3 

Meet 

7 

Aft 

Mind 

8 

Ahead 

Minute 

9 

All 

My 

10 

Amidships 

New 

11 

Answers 

No 

12 

APU 

Nothing 

13 

APUs 

Of 

14 

As 

On 

15 

At 

One  Third 

16 

Automatic 

Passing 

17 

Aye 

Per 

18 

Back 

Percent 

19 

Belay 

Pitch 

20 

Bells 

Port 

21 

Checking 

Propulsion 

22 

Combinations 

Revolutions 

23 

Continue 

Right 

24 

Course 

RPMs 

25 

Degrees 

Rudder 

26 

Ease 

Rudders 

27 

Emergency 

Shaft 

28 

Engine 

She 

29 

Engineroom 

Shift 

30 

Engines 

Sir 

31 

For 

So 

32 

Full 

Standard 

33 

Given 

Starboard 

34 

Go 

Steady 

35 

Goes 

Steer 

36 

Hard 

Stop 

37 

Head 

The 

38 

Headings 

To 

39 

Helm 

Turns 

40 

Her 

Two  Thirds 

41 

How 

Unit 

42 

Increase 

Very 

43 

Indicate 

Well 

44 

Is 

You 

45 

Keep 

Your 
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APPENDIX  B.  EXPERIMENT  RESULTS 
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Table  15.  Error  Types. 


Trial  1 

Trial  2 

Trial  3 

Trial  4 

Trial  5 

Subject  A 

smd 

smd 

scd 

scd 

cud 

Subject  B 

scd 

smd 

smd 

smd 

cmd 

Subject  C 

cmd 

cmd 

ccd 

smd 

smd 

Subject  D 

smf 

suf 

cud 

ccd 

cud 

Subject  E 

smd 

smd 

cmd 

1  1 

SETTING 

SCENARIO 

VESSEL 

S  =  SIMULATOR 

C  =  CHANNEL 

D  =  DESTROYER 

C  =  CONSOLE 

M  =  MOORING 

F  =  FRIGATE 

U  =  UNDERWAY 

Table  16.  Conditions  Per  Subject  Per  Trial. 


65 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


66 


APPENDIX  C.  ACRONYMS 


ANOVA 

AOR 

APU 

ARPA 

ASR 

BRM 

CAPT 

CDR 

CG 

COTS 

CVN 

DD 

DDG 

DNS 

DNSV6.0 

DoD 

ECDIS 


Analysis  of  Variance 
Replenishment  Oiler 
Auxiliary  Power  Unit 
Automatic  Radar  Plotting  Aid 
Automatic  Speech  Recognition 

Bridge  Resource  Management 

U.  S.  Navy  Rank  of  Captain,  0-6 
U.  S.  Navy  Rank  of  Commander,  0-5 
Guided  Missile  Cruiser 
Commercial-Off-The-Shelf 
Aircraft  Carrier,  Nuclear  Propulsion 

Destroyer 

Destroyer  (Guided  Missile) 

Dragon  NaturallySpeaking 

Dragon  NaturallySpeaking  Version  6.0 

Department  of  Defense 

Electronic  Chart  Display  and  Information  System 
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FFG 

HMM 

IBS 

IMO 

JV  2020 

LCDR 

LCS 

LPD 

LST 

MSI 

MSO 

NAGS 

NTR 

0-5 

0-6 

OOD 

OSC 


Fast  Frigate  (Guided  Missile) 

Flidden  Markov  Models 

Integrated  Bridge  System 
International  Maritime  Organization 

Joint  Vision  2020 

U.  S.  Navy  Rank  of  Lieutenant  Commander,  0-4 
Littoral  Combat  Ship 
Amphibious  Transport  Dock 
Landing  Ship,  Tank 

Marine  Safety  International 
Minesweeper,  Ocean 

Non-Voice  Activated  Command  System 
Naval  Transformation  Roadmap 

L).  S.  Navy  Rank  of  Commander 
U.  S.  Navy  Rank  of  CAPTAIN 
Officer  of  the  Deck 
Operations  Specialist  Chief 
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RFP 

SALT 

SR 

SRS 

STWC 

SWO 

Tr. 

TTS 

VACS 

VAS 

USCG 

W3C 

XML 


Request  For  Proposal 

Speech  Application  Language  Tags 
Speech  Recognition 
Speech  Recognition  Software 

Standards  of  Training,  Certification  and  Watchkeeping  for 
Seafarers 

Surface  Warfare  Officer 

Trimmed  Data  Set 
Text-to-Speech 

Voice  Activated  Command  System 
Voice  Activated  Systems 

United  States  Coast  Guard 

World  Wide  Web  Consortium 

Extended  Mark-up  Language 
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